Algorithms Weekly by Petr Mitrichev: December 2018

Sunday, December 30, 2018

A probably prime week

This week had one contest from each of the major platforms to wrap up 2018. First off, TopCoder held an experimental SRM 745 on Friday (problems, results, top 5 on the left, analysis). I did not participate myself, but I'd guess that solving 6 problems in 75 minutes felt a bit like SnarkNews Winter/Summer Series. The change of format did not stop Stonefeang from continuing his string of excellent results and winning this round. Well done!

AtCoder held its last contest of the year (and thus the last contest included in the onsite finals selection) on Saturday, the Grand Contest 030 (problems, results, top 5 on the left, my screencast, analysis, onsite finals selection standings, top 8 below on the right). Reading all problems really paid off in this round, as different contestants found different problems the hardest. For me problems B and C were the most difficult — in fact, I could not come up with a solution for any of them. It seems that the top two contestants cospleermusora and tourist had a similar story with problem C, but since it was worth less than problems E and F they ended up on top of the contestants with a different set of 5 problems solved. Congratulations to cospleermusora on the win!

Here is that problem C that was obvious for some and impossible to get for others: you are given a number k which is at most 1000. You need to come up with any nxn toroidal grid (the first row is adjacent to the last row, and the first column is adjacent to the last column), where you choose the number n but it must be at most 500, such that each cell of the grid is colored with one of the k colors, each color is used at least once, and for all pairs of colors i and j all cells with color i must have the same number of neighbors with color j. Can you see a way?

Codeforces held the Good Bye 2018 round on Sunday (problems, results, top 5 on the left, my screencast, analysis). tourist had one more great performance, going through the problems in order and finishing all of them more than 30 minutes faster than the only other contestant to solve everything, MakeRussiaGreatAgain. Congratulations to both!

My round was going pretty well right up to the point when I googled the main idea of the solution for problem G, but then made a bug in the implementation (I forgot that the square roots are always returned modulo n, and not modulo the "remaining" number that I still need to factorize), and instead of looking for it I assumed there's a problem with BigInteger.isProbablePrime in Codeforces and tried to work around it for quite some time. I've found this post with no answers and the post linked from it, and the fact that some of my submissions were getting "Denial of judgement" strongly supported the isProbablePrime failure theory.

Problem H had a pretty convoluted statement which boiled down to: there is a pile with n<=200000 stones, and two players are playing Nim with a fixed set of possible moves a₁, a₂, ..., a_k: in each turn a player can take a number of stones that is equal to one of the a_i, and the player who can't make a move loses. Your goal is to find the nimbers for all values of n between 1 and 200000.

I've tried to solve this problem at the end of the contest, but my mental state after fighting with G for an hour was not ideal :) I realized that I needed to speed up the standard quadratic approach 64 times using bitsets, but could not find a good way to do that. Can you see a way?

Thanks for reading, and check back [hopefully] tomorrow for my take on the best problems of 2018!

Saturday, December 29, 2018

A wave week

The Dec 17 - Dec 23 week contained a surprise extra Open Cup round: Open Cup 2018-19 Grand Prix of Xi'An (results, top 5 on the left, official round results, overall Open Cup standings so far). The Moscow SU team solved three problems in the last hour, including problem A which was solved by only three teams out of 150 attempts, and ended up winning by a problem. Congratulations on the great performance!

Codeforces Round 528 followed on Sunday evening (problems, results, top 5 on the left, analysis). It was mnbvmar's turn to be the only contestant to solve the hardest problem F, and thus win with a huge margin. Congratulations on the victory!

In my previous summary, I have mentioned several problems. The first one came from TopCoder: you are given a rooted tree with 100000 vertices. Each vertex has a demand d_i and a cost c_i. Your goal is to assign some non-negative value v_i to each vertex so that the demand of each vertex is less than or equal to the sum of values on the path from this vertex to the root of the tree, and the sum of c_i*v_i is minimized.

Let's look at the top-most vertices with non-zero value in the solution (such that all their parents and ancestors have zero value). Those vertices are not parents of one another, and thus they satisfy exactly one unit of demand of themselves and their descendants. If we now reduce their value by 1 and repeat, we can split into our solution into "waves" each wave satisfying one unit of demand in all remaining vertices with non-zero demand.

We can also notice that each wave can be constructed as follows: first, start with a wave that contains only the root vertex. Then repeatedly do the following: take any vertex with zero demand that is in the wave, remove it from the wave but add all its children. Each such operation changes the cost of the wave by the sum of costs of the children minus the cost of the removed vertex. Let's call this value the delta of the removed vertex.

Now we'll build the waves one by one while always maintaining the minimum cost wave. Initially such wave is just a root vertex. As soon as the number of waves we've built reaches the demand of a vertex, this vertex becomes available for replacement, which would change the weight of the wave by its delta. If the delta is positive, we don't need to do anything. If the delta is negative, and the vertex is in the minimum cost wave, then we need to do the replacement. If the delta is negative but the vertex is not yet in the minimum cost wave, we will need to do its replacement as soon as it gets into the wave, in other word we should add its delta to the delta of its parent. If the delta of the parent becomes negative, then we propagate it to its parent, and so on, so that at every moment we have only the optimal wave plus some non-negative deltas remaining.

In order for this to run fast, we need to directly propagate to the closest non-zero ancestor instead of going through all parents in the chain to it, which can be accomplished by keeping a disjoint-set union data structure where each set has a vertex with non-zero delta as its representative, and contains all vertices with zero deltas which have it as the closest non-zero ancestor.

This was quite tricky to come up with, but the implementation was rather simple and without any corner cases or data structures more complex than disjoint-set union.

The second problem I mentioned came from AtCoder: you are given n vertices, and n-1 sets of those vertices, each of size at least 2. Your goal is to pick exactly two vertices in each of the n-1 sets in such a way that if we draw the n-1 edges connecting each chosen pair to each other, we get a tree. n is at most 100000, and the total size of the given sets is at most 200000.

The first step is to think when is this actually impossible. The sample cases provided a helpful example: there was a case where there were two vertices that belonged to just one set, and moreover to the same set. Since only one edge will come out of this set, we will not be able to connect both of those vertices to the rest of the graph.

This obstacle can be generalized as follows: if there's a set of vertices of size k which is less than n such that the number of input sets containing at least one of those vertices is less than k, there's no solution.

But wait, we have seen such obstacles before! Consider the bipartite graph where the left part is the input vertices and the right part is the input sets, with edges connecting each set with its members. If there's no obstacle as described above, then by Hall's theorem in this graph there's a matching covering any (n-1)-element subset of the left part.

Let's find any such matching, it covers some (n-1)-element subset of vertices. The fact that all other (n-1)-element subsets can also be covered by a matching means that if we run one more step of the maximum matching algorithm, it must find augmenting paths from the only remaining uncovered vertex to all covered vertices (so that flipping this augmenting path would result in the solution for the corresponding (n-1)-element subset). These augmenting paths form a subtree of the residual network.

Now we can see that if we take any vertex covered by matching, its previous vertex in the augmenting path tree (which will be in the right part, so represent a set), and the set's previous vertex in the augmenting path tree (which will be in the left part, so represent a vertex), then we get two vertices belonging to the corresponding set, and those n-1 pairs are exactly the solution to our problem as they necessarily form a tree!

Finally, there was an Open Cup problem: two players are playing a game on a rooted tree with 100000 vertices. They make moves in turn. In each move of the first player, she colors one of the leaves of the tree black (initially all leaves are colored white). The root of the tree also initially contains a token. In each move of the second player, she can move the token from a vertex to any adjacent vertex. The second player wins if she moves the token to a leaf that is still white. The first player wins when all leaves are colored black and the second player has not yet won. Who will win if both players play optimally?

First, we can notice that it doesn't make sense for the second player to go up, as they will end up in a strictly worse state than before. So for each subtree, there's at most one moment in the game when the token enters this subtree, and then the game continues just in this subtree.

Moreover, suppose that the first player has made x moves into this subtree before the token enters it. Then it's clear that there's some boundary b such that when x<b the second player wins if the token enters this subtree, and when x>=b the first player wins if the token enters this subtree. This finally gives us a linear-time dynamic programming solution: we will compute this boundary b for each subtree going from the bottom to the top.

Thanks for reading, and check back tomorrow!

A Kuhn's week

The Dec 10 - Dec 16 week was livelier than a few previous ones. Codeforces Round 526 started the fun right on Monday (problems, results, top 5 on the left, analysis). Radewoosh continued his string of excellent results, was the only contestant to solve five problems, got the first place with a huge margin and also overtook tourist for the first place in the rating list — very well done!

TopCoder SRM 744 took place on Friday (problems, results, top 5 on the left, my screencast, analysis). Trying to learn from my experience in the TCO final, when approaching the hard problem I have decided to not implement relatively straightforward solutions involving either heavy-light decomposition or propagating piecewise-linear functions up the tree, but instead think a bit more to come up with an easier to implement solution. The strategy has almost backfired as I got my "simple" solution to work with less than two minutes into the contest. However, it did work in the end as a few other contestants going for the straightforward but complicated implementation approaches have failed because of subtle bugs (for example, ainta was above me after the coding phase but his solution failed as he should've used a multiset instead of a set, presumably inside the representation of a piecewise-linear function or something similar).

Here's that problem: you are given a rooted tree with 100000 vertices. Each vertex has a demand d_i and a cost c_i. Your goal is to assign some non-negative value v_i to each vertex so that the demand of each vertex is less than or equal to the sum of values on the path from this vertex to the root of the tree, and the sum of c_i*v_i is minimized. What's the easiest to implement approach, in your opinion?

AtCoder Grand Contest 029 followed on Saturday (problems, results, top 5 on the left, my screencast with commentary, analysis). The problemset was relatively approachable this time, so to win one had to be fast and to minimize incorrect attempts. LHiC executed very well on both fronts and got first place with the margin of 12 penalty minutes. Well done!

I was submitting my last problem around 88-th minute for the first time, but got time limit exceeded on just one testcase. My solution involved Dinic's maximum flow algorithm that I have copy-pasted from a previous solution. Later I submitted the same solution in C++ and it passed in just 0.5s, while the time limit is 4s. Maybe somebody can tell why Java is more than 8x slower this time?

Of course, later I tried submitting Kuhn's maximum matching algorithm in Java instead, which is supposed to be quadratic in the worst case, but it actually passed within the time limit :)

Reducing the problem to maximum matching is also quite fun. Here's the statement: you are given n vertices, and n-1 sets of those vertices, each of size at least 2. Your goal is to pick exactly two vertices in each of the n-1 sets in such a way that if we draw the n-1 edges connecting each chosen pair to each other, we get a tree. n is at most 100000, and the total size of the given sets is at most 200000.

Open Cup 2018-19 Grand Prix of Peterhof took place on Sunday (results, top 5 on the left). Compared to a few previous rounds, the Moscow SU team has cut down dramatically on incorrect attempts, and thus got their first win of the season. Congratulations!

Problem G was a nice dynamic programming exercise. Two players are playing a game on a rooted tree with 100000 vertices. They make moves in turn. In each move of the first player, she colors one of the leaves of the tree black (initially all leaves are colored white). The root of the tree also initially contains a token. In each move of the second player, she can move the token from a vertex to any adjacent vertex. The second player wins if she moves the token to a leaf that is still white. The first player wins when all leaves are colored black and the second player has not yet won. Who will win if both players play optimally? Can you see a way to avoid traversing the entire ~2¹⁰⁰⁰⁰⁰ state space?

Finally, Codeforces ran Avito Cool Challenge 2018 later on Sunday (problems, results, top 5 on the left, analysis). tourist was a bit slower than LHiC but was able to find two challenges and regain the first place in the round, and with it the first place in the rating list. Congratulations!

Thanks for reading, and check back for more!

A lazy week

The Dec 3 - Dec 9 week featured TopCoder SRM 743 (problems, results, top 5 on the left, my screencast, analysis). Stonefeang was quite close to increasing his lead in the overall standings as he was temporarily in first place during the challenge phase after finding a successful challenge. However, ksun48 then found a successful challenge of his own, regained his first place from the coding phase, and joined Stonefeang at the top of the overall leaderboard. Congratulations!

Thanks for reading (was this the shortest post of the year?) and check back for more!

A colorful week

TopCoder SRM 742 was the first event of the Nov 26 - Dec 2 week (problems, results, top 5 on the left, analysis). This was the first round of a new three-month race to a TCO19 onsite spot, and Stonefeang got the best possible start to this race thanks to having the fastest time on two out of three problems. Congratulations!

NEERC 2018, now known as Northern Eurasia Programming Contest 2018, was the main event of the week in Russia and some neighboring countries (problems, results, top 12 on the left, parallel round results, analysis). The world champions from the Moscow SU were trailing during most of the contest, sometimes by more than one problem, but pulled themselves together in the end and still won the contest thanks to being the only team to solve problem H. Congratulations on the great comeback!

The performance of the team White Sprite in the parallel round shows that there is still a lot of room for improvement for the Russian teams, though :)

Thanks for reading, and check back for more!

A center week

The Nov 19 - Nov 25 week had its contests on Sunday, the first one being Open Cup 2018-19 Grand Prix of Dolgoprudny (results, top 5 on the left). For the third time in the last four Open Cup rounds, the team Past Glory was first and the team Moscow SU: Red Santa was second, with quite a huge gap stemming from the large amount of incorrect attempts for the Red Santa. Congratulations to both teams on yet another strong performance!

Codeforces hosted Mail.Ru Cup 2018 Round 3 later that day (problems, results, top 5 on the left, overall results, analysis). Radewoosh continued his fine recent form, solving all problems with 15 minutes to spare and with just two incorrect attempts, compared to V--o_o--V's nine. Congratulations on the win!

In my previous summary, I have mentioned two TCO onsite problems. The first one was constructive: you are given an integer n up to 10¹⁸. You need to return any 4-connected figure which has exactly n different domino tilings, and which fits into a relatively small bounding box: the area of the bounding box must not exceed 14400, and the height of the bounding box must not exceed 13.

It's very easy to build multiplication: given two figures with a and b tilings respectively we can build a figure with a*b tilings by concatenating them possibly with a small insertion in the middle so that no new spurious tilings appear.

However, in order to build arbitrary numbers we usually need both multiplication and addition, and addition is where the main difficulty lies: since the number of domino tilings is a "global" object, it's not clear how we could achieve "either a or b" behavior.

The official analysis shows two great ways to achieve addition, based on the idea that we can build figures that have only one tiling with one "parity" and lots of tilings with other "parity". Please look there for more details (the problem name is "DominoTiling").

The second problem I mentioned was of the more traditional optimization style: you are given n<=200 points on the plane. You need to connect some pairs of points with straight line edges, so that you build exactly n-1 edge and the result forms a tree. What is the minimum possible diameter of the resulting tree (the maximum distance between its vertices, were the length of each edge is equal to the corresponding Euclidean distance)?

Solving this problem required a good understanding of how distances in trees work. More specifically, if we take any diameter of a tree, and find its middle point (which might be a vertex and might be in the middle of an edge), this point will be the center of the tree: all vertices of the tree will be at a distance of at most d/2 from it (where d is the diameter). Moreover, the opposite is also true: if there's a point in the tree such that all vertices are at a distance of at most d/2 from it, then the diameter of the tree is at most d. The important aspect here is that we don't have to consider the distances between pairs of vertices anymore — all that matters are the distances to the center.

Suppose the center lies in the middle of the edge connecting vertices u and v. Then for the part of the tree containing u the distances to the center are going through u. Then we can notice that because Euclidean distance satisfies the triangle inequality, if we rearrange that part of the tree to have all other vertices connected directly to u, the distances to the center will not increase, which means that the diameter will not increase, too.

In other words, we can assume that the optimal tree always looks like an edge u-v, and all other vertices are connected to either u or v.

Moreover, we can see from the above argument that all that matters in the optimal tree is the maximum distance from u to a vertex connected to it, and the maximum distance from v to a vertex connected to it, and thus when deciding whether to connect each other vertex to u or v we just need to think about those two numbers. For example, suppose that w is the vertex farthest from u that is connected to u. Then we can connect to u all vertices that are closer to u than w without increasing the diameter, so the set of vertices connected to u will be a prefix of all vertices sorted by distance to u (during the contest I've sorted by the difference of distances to u and v, which also works).

This allows an O(n³) solution: try all possibilities for u and v, all prefixes of the list vertices sorted by distance to u. Connect the vertices from the prefix to u and the rest to v, and find the minimum diameter of all trees built in this way.

Note that we still need to find the diameters of those trees properly, as we're not guaranteed that the center will be on the u-v edge for any such tree, and finding the diameter properly requires finding the maximum and second maximum edge connected to each of u and v, which can be done by computing prefix and suffix maximums and second maximums.

The "second maximum" part can be avoided if we remember that we don't really care about the cases where the center is on the u-v edge — we just need to make sure that those cases don't incorrectly get a too small diameter computed. For example, Kevin does in fact check in his solution if the diameter is on the u-v edge and throws away the tree if it's not; however, since the distances are floating-point we might get rounding issues here if the optimal center is in fact a vertex, so he had to handle that by adding an epsilon to the comparison. Gennady, on the other hand, just computes the diameter in pessimistic fashion for such trees: if the center ends up being outside the u-v edge, we compute the diameter as 2 times the maximum distance from u and v to other vertices. This will never be less than the actual diameter, and for the cases where the center is u or v this will be equal to it, just as we need.

Thanks for reading, and check back for more!

Friday, December 28, 2018

A 2015 week

TopCoder Open 2018 onsite event in Dallas was headlining the Nov 12 - Nov 18 week. The first semifinal took place earlier on Thursday (problems, results on the left, broadcast, analysis). tourist ended up dominating the proceedings by solving the 1000-pointer in just 15 minutes, but of course the main goal in this round was getting into the top 4. There was quite some drama involved as Egor ended up fifth after resubmitting his solution for the 250-pointer because the original solution could not handle exactly one case correctly: 1x1 grid. Congratulations to tourist, Errichto, Um_nik and ACRush on advancing to the final!

The medium problem in this round was a great combination of simple statement and exciting problem solving, while still being medium difficulty: you are given an integer n up to 10¹⁸. You need to return any 4-connected figure which has exactly n different domino tilings, and which fits into a relatively small bounding box: the area of the bounding box must not exceed 14400, and the height of the bounding box must not exceed 13 (I guess this second condition was required to make it possible to check the output for correctness in reasonable time). Can you see a way?

Semifinal 2 followed very late on Thursday (problems, results on the left, broadcast, my screencast, analysis). Before the semifinal rounds I was telling other contestants that I expect that a fast 250 should be mostly enough to get to 4 out of 8, as you can expect some people to fail the system tests and ending up above them is enough with such generous advancement rules. Well, apparently I had to support my theory by example this time :) You can see from the screencast that I knew that my medium solution was incorrect, but I could not fix it (it turned out that the approach is probably impossible to get to work), and there was just too much implementation involved in the hard problem. During the challenge phase I was very happy when my medium was challenged because the challenger was not Kankuro, and thus I was still solidly in top 4.

The final round took place one day later, on Friday (problems, results on the left, broadcast, my screencast, analysis). There were a couple of important decisions that shaped this round for me.

First, I've decided to implement a quite tedious solution for the easy problem (which was actually incorrect but not caught by system tests). That led to the situation where after submitting the easy I was quite a bit behind, and seeing as the leader tourist has opened the medium problem after solving easy, it felt that going for the hard would increase my chances for the first place.

Then I was actually able to solve the hard problem quite quickly thanks to coming up almost immediately with the key solution idea: that diagonals can be handled independently. This might've been helped by the fact that I have seen this idea before, and in a very similar setting: my previous TopCoder Open victory in 2015 had medium problem with the reference solution using exactly this idea (I did not come up with this idea in 2015, so you can't say that one idea got me two wins, but still the parallels are amusing).

Then I've tried to solve the medium problem, had some good ideas but could not make the solution work, and had almost given up — but then Kevin has submitted his medium with about 9 minutes to go and overtook me in the standings. This was apparently just the extra bit of motivation I needed, as I've then resumed my attempts at solving the medium and managed to submit it with just about 50 seconds remaining in the coding phase! Once again some remarkable parallels to the situation three years ago :)

Here's the medium problem that caused all this excitement: you are given n<=200 points on the plane. You need to connect some pairs of points with straight line edges, so that you build exactly n-1 edge and the result forms a tree. What is the minimum possible diameter of the resulting tree (the maximum distance between its vertices, were the length of each edge is equal to the corresponding Euclidean distance)?

After the dust from the TCO finals had settled a bit, Open Cup Grand Prix of Siberia took place on Sunday (results, top 5 on the left). Both the team Past Glory and the ICPC world champion team from Moscow SU were able to solve 11 problems, but with 13 penalty attempts against 3 and corresponding debugging time the Moscow team had almost 50% higher penalty time. Congratulations to the team Past Glory on the victory!

A -17 week

Codeforces hosted Mail.Ru Cup 2018 Round 2 during the Nov 5 - Nov 11 week (problems, results, top 5 on the left, analysis, my screencast). aid has claimed the victory thanks to solving problem G (that was already a very tricky geometry problem, and then let me quote the official analysis: "One thing you need to note here is that at given constraints you can't do those computations in floating point numbers, because the relative difference between r₁+r₂ and the distance from a vector to a polygon may be as low as 10⁻¹⁷. Therefore, you need to do all calculations in integers."). Moreover, his submission for G came with just 35 seconds remaining in the round. Congratulations on the persistence and the win!

One day later the Open Cup returned with the Grand Prix of Poland (results, top 5 on the left). 4 hours were enough for 13 problems for team Past Glory, while teams MIT THREE and Moscow SU: Red Santa required a bit more time but still solved everything. Congratulations to the top three!

In my previous summary, I have mentioned a Hacker Cup problem: there are w<=80 windows and s<=50 stars, both cells on a grid which represents a skyline. One must draw several buildings, each represented by a rectangle on the grid with line y=0 as the bottom side. Several buildings may intersect. Each window must be inside at least one building, and each star must not be inside any building. We need to find the minimum number of buildings required to satisfy those constraints. Such a problem would be too easy, so there's an extra twist: suppose we remove a random subset of windows (each window is removed independently with probability 0.5), what is the expected value of the minimum number of buildings required to cover all remaining windows without covering the stars?

First, suppose no windows are removed. It turns out that we can solve the problem greedily: let's go from top to bottom until we encounter a window that is not yet covered by any building. It's clear that we need to have some building covering this window, and since there are no uncovered windows above this one, the height of this building should be exactly equal to the height of this window. Then there's no reason not to extend this building as much as possible to the left and to the right, until we reach the boundary of the grid or the columns where there is a star at the same level or below this window. After that we exclude all windows covered by the newly added building and continue going from top to bottom.

Now we need to understand how we can run 2^w instances of this greedy at the same time, one per possible subset of windows that are removed. The usual way to run so many greedys in parallel is dynamic programming: we need share large parts of the greedy execution between different problem instances. But what can we share?

Consider the bottom-most star. While the greedy operates at its level or above, the computations to the left and to the right of it are completely independent, and when the greedy passes its level, there are no more relevant stars, and thus the only thing that matters is whether there's any yet uncovered window below that bottom-most star level: if there is at least one, we'll need to add exactly one building, otherwise we're already done.

So far, this seems like a promising way to organize the dynamic programming as we have split the problem into two almost independent sub-problems. We can repeat the same step again for each of the sub-problems by finding the bottom-most star in each. This way our sub-problem can be viewed as a segment together with a level, such that all stars within the segment are above that level, and we have only O(s) such sub-problems as we get two new sub-problems per one star.

However, there's a catch: for windows that are below the level, we need to know if they have been covered from above or not, in order to determine whether we need to add one to the answer on subsequent steps. The first idea of keeping a set of windows which are not yet covered results in exponential state space.

Here comes the key idea that puts everything together: consider one of our sub-problems, a segment together with a threshold level such that there are no stars below that level. Suppose we have already decided which windows there are removed and which are not, including the windows there that are below the threshold level, and ran the greedy until the threshold level. Then it turns out that instead of remembering the subset of windows below the threshold level which are not removed and not covered, it is enough to remember the top-most such window (and left-most of those if there are several top-most ones), because any building that will cover the top-most window and have height below the threshold level will also cover all those windows!

This means our dynamic programming has just O(w*s) states, and thus easily runs in time almost no matter how we implement the transition.

I have also mentioned a Codeforces problem: you are given a tree with nodes numbered from 1 to n (n<=1000) and a subset of nodes in it which forms a connected subtree, given as a list of numbers of the nodes. There's also a second, hidden numbering of all nodes with distinct numbers from 1 to n, and a second subset of nodes which also forms a connected subtree, which is given to you as a list of hidden numbers of its nodes. However, you don't know how the hidden numbers of the nodes correspond to the visible ones. You can ask an oracle questions of two types:

What is the hidden number of the node with [visible] number x?
What is the visible number of the node with hidden number y?

Your goal is to find any vertex that belongs to both given subsets, or report that there isn't any. You can ask at most 5 questions.

The solution here is beautifully simple. Let's take any hidden number that belongs to the second subset, and ask about its visible number, this way we find one node belonging to the second subset in our tree. Now let's find the node in the first subset that is closest to the known node of the second subset (since we're in a tree and the subset is connected, there will be exactly one closest node). It turns out that in case there is intersection between the two subsets, this node will belong to the intersection (again because the subsets are connected)!

All that remains is to ask the oracle about the hidden number of this node, and to check if it belongs to the second subset. We end up requiring just two questions to the oracle no matter how big the tree is.

Thanks for reading, and check back for more!

Thursday, December 27, 2018

A week with Ethan

TopCoder SRM 741 was the first event of the Oct 28 - Nov 4 week (problems, results, top 5 on the left, analysis). xudyh and tourist came to this event having the same number of points in the TCO19 qualification race (link, click on "Stage 1"), and this was the last round of the first stage so one of them would become the first TCO19 finalist. tourist had much better tie-breaker before this round, and has made it a habit to be in top 10, so the qualification boiled down to: if xudyh can win, he would qualify, otherwise it would most likely be tourist. xudyh came quite close, but was stopped by the great performance of fjzzq2002. Congratulations to xudyh on the great performance, to fjzzq2002 on the win and to tourist on making it to the TCO19 finals so early!

Facebook Hacker Cup 2018 onsite final round took place on Saturday (problems, results, top 5 on the left, analysis). There was quite tense competition until the very end, but ultimately Mikhail's (LHiC) speed could not be matched. Congratulations on the huge victory!

I've spent a huge chunk of the contest thinking that I almost have a solution to the problem "City Lights" (as in, "now it should finally start working in a few minutes after I handle this surely last corner case"), however it stayed as "almost" until the end of the contest. Here's what the problem was about: there are w<=80 windows and s<=50 stars, both cells on a grid which represents a skyline. One must draw several buildings, each represented by a rectangle on the grid with line y=0 as the bottom side. Several buildings may intersect. Each window must be inside at least one building, and each star must not be inside any building. We need to find the minimum number of buildings required to satisfy those constraints. Such a problem would be too easy, so there's an extra twist: suppose we remove a random subset of windows (each window is removed independently with probability 0.5), what is the expected value of the minimum number of buildings required to cover all remaining windows without covering the stars?

Codeforces ran another onsite contest on Sunday, the Lyft Level 5 Challenge Final Round (problems, results, top 5 on the left, parallel round results, analysis). tourist was unstoppable this time, solving all problems and doing it 20 minutes faster than scott_wu, the only other onsite contestant to solve everything. Congratulations to both!

Problem B in this round was a very nice interactive problem: you are given a tree with nodes numbered from 1 to n (n<=1000) and a subset of nodes in it which forms a connected subtree, given as a list of numbers of the nodes. There's also a second, hidden numbering of all nodes with distinct numbers from 1 to n, and a second subset of nodes which also forms a connected subtree, which is given to you as a list of hidden numbers of its nodes. However, you don't know how the hidden numbers of the nodes correspond to the visible ones. You can ask an oracle questions of two types:

What is the hidden number of the node with [visible] number x?
What is the visible number of the node with hidden number y?

Your goal is to find any vertex that belongs to both given subsets, or report that there isn't any. You can ask at most 5 questions.

Thanks for reading, and check back for more!

A first hour week

The Oct 22 - Oct 28 week had two Codeforces rounds. On Wednesday, it was time for Round 518 (problems, results, top 5 on the left, analysis). Radewoosh could not pass pretests in problem D even after 8 attempts, but it turned out to be not necessary for the first place as ksun48 has failed the same problem on systests. Congratulations to Radewoosh on the win!

Codeforces Round 519 took place on Sunday (problems, results, top 5 on the left, analysis, my screencast). With 6 problems solvable under an hour and the last problem which was too difficult to crack during the contest, the fight for the first place quite naturally came down to challenges. scott_wu extracted the most from the available opportunities and won by more than 100 points — well done!

Thanks for reading (not so much to read this time :)), and check back for more!

A rolling week

The Oct 15 - Oct 21 week started with Codeforces hosting Mail.Ru Cup 2018 Round 1 (problems, results, top 5 on the left, analysis). Despite the round's slightly longer 2:30 duration, the usual two hours were enough for mnbvmar to claim the first place. Congratulations!

TopCoder SRM 740 followed on Saturday (problems, results, top 5 on the left, analysis). With the win in this round, xudyh has joined tourist at the top of the TCO19 points scoreboard (link, click on "Stage 1") and made sure it's just the two of them fighting for the single TCO19 direct qualification spot in the final October SRM a bit later. Well done!

My solution for the hard problem has failed the system test, and I was trying to recall why when I'm writing this post now, two months later. I did not remember, so I've decided to look at the diff between my solution during the round and the one I got accepted afterwards:

31,33c31,33
< matrix[limbo][start * 2 + startState] += old;
< if (matrix[limbo][start * 2 + startState] >= MODULO)
< matrix[limbo][start * 2 + startState] -= MODULO;
---
> matrix[limbo][start + 2 * startState] += old;
> if (matrix[limbo][start + 2 * startState] >= MODULO)
> matrix[limbo][start + 2 * startState] -= MODULO;

When I first saw this diff today, there were also a few spurious formatting diffs here and there, and I could not actually notice that this one is meaningfully different. I guess this is what happened during the round as well, and is similar to what happens when reading those "jumbled" English sentences: we most likely don't read the programs character by character or token by token, but rather read the statements as a whole, and thus can assume parts which are not there. Anyway, enough psychology speculation, the diff is: the first block has "* 2 +", and the second one has "+ 2 *" (the former is correct).

Finally, one more Codeforces round, the 517-th one, took place on Sunday (problems, results, top 5 on the left, analysis). It was Radewoosh this time who earned enough points for the first place just one hour into the contest, which was a bit more risky in this round given that ainta did in fact solve a hard problem that Radewoosh did not. Congratulations on the victory!

In my previous summary, I have mentioned a few problems. First one was from TopCoder: you are given a number n between 3 and 500. You need to generate any simple (not necessarily convex) polygon with n vertices, where all vertices have integer coordinates between 1 and 25, inclusive, and which has the smallest possible area among such polygons.

The mathematical part of the solution is rather standard: Pick's theorem tells us that to minimize the area it is necessary and sufficient to have no integer points inside the polygon, and no extra integer points except the n vertices on the boundary of the polygon.

Now we need to figure out a way to draw such polygon without extra integer vertices on a relatively small grid. This picture by fjzzq2002 demonstrates a very good approach to do that.

The second mentioned problem from AtCoder has a very good editorial (problem D "Chords" there), I don't have much to add to it.

Finally, there was an Open Cup problem which is actually not explained in the corresponding analysis: you are given a 50x50 grid where some cells are empty, and some are walls. There are also walls surrounding the grid. Some empty cells are marked as targets, and there's also a ball in one of the empty cells. You can move the ball in each of the four cardinal directions, but once it starts rolling, it rolls in the same direction until it hits the wall, at which point you can pick a new direction, and so on. Can you roll the ball in such a way that it passes through all targets in some order?

Here's the key observation: suppose we roll the ball to the right at some point. Then we can immediately roll it to the left and to the right again, end up in exactly the same position, but be guaranteed to have covered an entire horizontal segment of the grid. Similarly, rolling the ball up or down can be viewed as traversing an entire vertical segment of the grid.

So instead of thinking in terms of moving the ball from cell to cell, we can enumerate all non-extendable horizontal and vertical segments in the grid, and think of our movement as jumping from one such segment to another. For example, from a horizontal segment we can jump to one of the two vertical segments containing the endpoints of the horizontal segment.

The key benefit from such formulation comes from the fact that every target belongs to just two such segments: one horizontal and one vertical. The number two points us towards reducing this problem to 2-SAT: let's introduce one variable per segment, which will be true if our path visits this segment, and false otherwise. The condition about visiting all targets can now be encoded as a set of "one of those two variables must be true" clauses, which are permissible in 2-SAT.

The other condition that we must enforce is that all visited segments can be put on one path that starts from the starting position. In order to achieve that, we add two types of restrictions:

If a segment is not reachable from the starting position, the corresponding variable must be false.
For each pair of segments such that none is reachable from the other, at least one of the two corresponding variables must be false.

The second restriction guarantees us a total linear order of the visited segments, which is exactly what a path is, and the first one makes sure this path can start in the right place. Both restrictions are already expressed in the form of 2-SAT clauses.

Our 2-SAT problem has O(n²) variables and O(n⁴) clauses, where n=50 is the size of the grid. Since 2-SAT is solved in linear time, this is fast enough. One more potentially slow part is discovering all restrictions of the second type as we need to find the transitive closure of a graph with O(n²) vertices. However, the number of edges in the graph is also O(n²), so the transitive closure can be found in O(n⁴) as well.

Thanks for reading, and check back for more!

Tuesday, December 25, 2018

A bmerry week

TopCoder SRM 739 was the first round of Oct 8 - Oct 14 week (problems, results, top 5 on the left, analysis, my screencast). Nobody was able to solve all three problems correctly this time, but pashka's very fast solution to the hard problem earned him a comfortable first place. Congratulations on the win!

Here's that hard problem: you are given a number n between 3 and 500. You need to generate any simple (not necessarily convex) polygon with n vertices, where all vertices have integer coordinates between 1 and 25, inclusive, and which has the smallest possible area among such polygons.

AtCoder Grand Contest 028 took place on Saturday (problems, results, top 5 on the left, analysis, my screencast). With just a few rounds left in the year, the fight for the top 8 in the AtCoder Race Ranking is entering its deciding phase. Not so much for tourist, who is head and shoulders ahead of everybody else, and who has further cemented his Race Ranking lead with a win here. Well done!

Problem D in this round was a cute combinatorics exercise. You are given 2n points on the circumference of a circle, n is up to 300. We're going to form n pairs of points, such that each point is in exactly one pair, and draw straight line segments between the points in each pair. After all segments are drawn, we're going to count the number of connected components of segments (two segments are connected directly if they intersect). We have already formed k such pairs, and need to form the remaining n-k. You need to find the sum of the numbers of connected components over all possible ways to form the remaining n-k pairs.

Open Cup 2018-19 Grand Prix of Korea followed on Sunday (results, top 5 on the left, partial analysis). Just like last week, team Past Glory did not really need the whole 5 hours — congratulations on another convincing win!

Our team, on the other hand, used all available time, and solved the last problem with less than a minute remaining on the clock. You can see the diff between the previous incorrect submission and the last-minute correct one on the right. Just imagine: there's almost no time left, we got wrong answer, have implemented a stress-test and found a discrepancy, but there's no time to debug. What could be wrong in this code, anyway? Let's try swapping left and right — and voilà, stress-test passes, and there're still a few seconds to submit :)

Problem B in this round left me with a feeling "wow, how come such problem hadn't been set before!": you are given a 50x50 grid where some cells are empty, and some are walls. There are also walls surrounding the grid. Some empty cells are marked as targets, and there's also a ball in one of the empty cells. You can move the ball in each of the four cardinal directions, but once it starts rolling, it rolls in the same direction until it hits the wall, at which point you can pick a new direction, and so on. Can you roll the ball in such a way that it passes through all targets in some order?

Codeforces Round 516 ran in parallel with the Open Cup (problems, results, top 5 on the left, analysis). With nobody able to solve everything, choosing the right problems to solve was a necessary part of winning, and mnbvmar found the right set of problems and solved them faster than everybody else. Congratulations on the victory!

This round also brought somewhat sad news: Bruce, who placed second and regained his nutella status, is retiring from rated contests on a high note. I've certainly enjoyed competing against Bruce for a really long time (and competing in a team once, as we solved a World Finals mirror together a few years ago!), so I'm still hoping to chat to him at the onsites which don't have a rating system :)

In my previous summary, I have mentioned an IPSC problem: you are given a prime number p up to 10¹⁰⁰, a number k, and n<=100 triples of numbers a_i, b_i, m_i. Your goal is to count how many numbers x between 0 and p-1 satisfy the equation ((a_i * x + b_i) mod p) mod m_i = ((a_i * k + b_i) mod p) mod m_i for at least one i. Moreover, you need just the approximate count: an error of up to 1% is fine. There are 100 testcases to solve, and the duration of the contest is 5 hours.

A rather straightforward first step in solving this problem is to figure out how to solve each individual modular equation. I won't go into the details as it's not the most exciting part of the problem, so I will just state that we can find the number of solutions, test if a given number is a solution, and describe all solutions in a way that allows sampling a uniformly random solution, for example.

Now we have 100 huge sets of numbers that we know the size of and can test and generate, and need to estimate the size of their union with at most 1% error. The allowed error probability strongly hints at some kind of Monte-Carlo approach. The most straightforward Monte-Carlo approach would be: just repeatedly take a random number between 0 and p-1, and check if it belongs to one of the sets. This can give us an estimate with an error that's certainly less than 1% of our sampling pool, in other words less than 1% of p.

To see it more formally, we can say that in each trial, we sample from a random variable that is equal to either 0 or p, and is equal to p with probability a/p where a is our answer, and then give a mean of k such samples as our estimate. The central limit theorem tells us that such mean tends to behave like a normal variable with mean equal to the mean of individual sample, which is p*a/p=a, just as we need. The central limit theorem also tells us how to estimate the error: the standard deviation of that normal variable will be equal to the standard deviation of the individual sample divided by the square root of k. The standard deviation of the individual sample is sqrt(p*p*a/p-a*a)=sqrt(a*(p-a)), and we can assume that a normal variable will never exceed, say, its 6 standard deviations, so our error will be at most 6*sqrt(a*(p-a)/k). Unfortunately when a is very small relative to p, this is still very big relative to a.

At this point during the contest I've tried to come up with some ad-hoc fixes using common sense, and did not succeed. However, I should've simply continued using more formal statistics! The problem in the above approach is that we're sampling from the individual distribution with too big standard deviation — so let's come up with another one. Instead of sampling from all p possible numbers, let's use the fact that we can generate solutions for each equation: let's sample a random (i, x) pair where x is a solution to the i-th equation, in such a way that every valid (i, x) pair is equally likely. We can do that by first sampling i proportionally to the number of solutions of each equation, and then sampling a uniformly random solution x to the i-th equation.

Now we need to come up with a random variable with the mean equal to our answer. Let t_i be the number of solutions of the i-th equation, and let t be the sum of t_i. Given a sample (i, x) let's make our random variable equal to t if there's no other equation with a smaller index j < i such that x is also a solution of the j-th equation, or 0 otherwise. It's not hard to see that for each distinct x from the union of all solutions our random variable will be equal to t for exactly one i, and thus the mean of our random variable is equal to a*t/t=a, just as required (Another way to achieve the same mean would be to make our random variable equal to t divided by the number of equations that x is a solution for).

The standard deviation of this variable is equal to sqrt(a*(t-a)), and we know that t is at most a*n, so it is at most sqrt(a*a*(n-1))<=a*sqrt(n). Now we need to pick such k that 6*a*sqrt(n)/sqrt(k)<=0.01*a, or 600*sqrt(n)<=sqrt(k), or k>=n*360000, so as n=100 we get k>=36 million. This is already doable in a few seconds per testcase, but in reality we need even less attempts as 6 standard deviations is a very conservative bound.

During the contest I could not come up with such sampling idea, and instead tried to return the result as a sum of estimates of sizes of first set, second set minus first set, third set minus first two sets and so on, where each particular estimate is built using essentially the above approach. However, it meant I needed to decide how to allocate my samples between the n estimates, and I could not get that part right (from the above solution, it is clear that I simply needed to have sample counts proportional to t_i — and indeed, I did get my solution accepted after the end with this fix).

Thanks for reading, and check back for more!