Sunday, January 13, 2019

A Dilworth week

This week was light on contests, so let me talk about the Codeforces problems I've mentioned in last week's summary. The first one was: you are given a permutation of size n <= 100000, and you need to split it into subsequences such that each subsequence is either monotonically decreasing or monotonically increasing. The number of subsequences needs to be small, but does not need to be minimal. More precisely, your solution should have the optimal worst case performance for any given n: the number of subsequences should not exceed the maximum number of subsequences necessary for any permutation of size n.

One fact that definitely looks helpful is the Dilworth's theorem: it tells us that we either have a long increasing subsequence, or can split our sequence into few decreasing subsequences. More formally, let's define f(n) to be the maximum number of subsequences our solution produces for any permutation of size n. Applying the Dilworth's theorem allows to build a solution with the following recurrence on f():

f(n)=maxk(min(k,1+f(n-k))

where the inner minimum corresponds to the fact that we can either split everything into k decreasing subsequences, or remove an increasing subsequence of length k and apply our solution recursively.

Printing a few first values of f(n) computed the recurrence above, we get:

f(1) = 1
f(2) = 1
f(3) = 2
f(4) = 2
f(5) = 2
f(6) = 3
f(7) = 3
f(8) = 3
f(9) = 3
f(10) = 4
f(11) = 4
f(12) = 4
f(13) = 4
f(14) = 4
f(15) = 5
f(16) = 5
...

Here we can notice that the smallest value of n such that f(n)=k is the k-th triangular number: 1+2+...+k=k*(k+1)/2. Having noticed it, it's relatively easy to prove it by induction.

So we have a solution using Dilworth's theorem, but is it good enough for the problem? Does it have the same worst case performance as the best possible solution for any given n?

The answer is yes, and the triangular numbers together with the recurrence above give us a hint how to see it. We need to come up with some permutation of size 1+2+...+k for which we can prove that it can't be split into less than k increasing or decreasing subsequences. 

Here is one such sequence for k=5: 1, 3, 2, 6, 5, 4, 10, 9, 8, 7, 15, 14, 13, 12, 11.  In other words, we take a decreasing segment of length 1, then append a decreasing segment of length 2 with all numbers bigger than the previous segment, then append a decreasing segment of length 3 with all numbers bigger than the previous segment, and so on until a segment of length k. Consider any split of this sequence into increasing and decreasing subsequences. Suppose it has x increasing subsequences. Each increasing subsequence covers at most one element of each decreasing segment, so for k-x segments with more than x elements at least one number will be left uncovered. But a decreasing subsequence can't have numbers from two different segments, so we will need at least k-x decreasing subsequences to cover the rest, and the total will be at least k.

This proves that our solution does in fact have the optimal worst case performance for any given n. One small thing that the Wikipedia article doesn't give is a constructive way to apply the theorem: finding the longest increasing subsequence is a well-known algorithm, but how do we actually split the sequence into as many decreasing subsequences quickly? Some quick googling helped me find the answer during the contest: in the longest increasing subsequence algorithm, let's call the length of the longest increasing subsequence ending in each element the level of this element. It turns out that the elements of the same level always form a non-increasing (and since we have a permutation, decreasing) subsequence, as otherwise the element to the right would have a higher level, so we just split by level.

The second problem I mentioned turned out to be a repeat from 9 years ago: you are given a nxm matrix such that n*m<=300000, with one character from the set A, C, G or T in each cell. You need to change as few characters as possible (to other characters from the set) in such a way that each 2x2 submatrix has all four different characters.

Given that there are consequently two editorials for it, I don't think I can add much :)

Thanks for reading, and check back next week!

Sunday, January 6, 2019

A Radewoosh week

Codeforces did not take any breaks, and held two contests in the first week of 2019. The first one was appropriately named "Hello 2019" (problems, results, top 5 on the left, my screencast, analysis). V--o_o--V took a gamble and started with problem G which looks doable on the surface but takes quite a lot of time to get right. This was not optimal in terms of the number of points, but it or the five incorrect attempts did not matter in the end as nobody else was able to solve all problems. Congratulations to V--o_o--V!

I was actually quite close to getting all problems, as I've fixed the last bug in my solution for H just 30 seconds after the end of the contest (you can still see it on the screencast). And given the unexpectedly low point total of V--o_o--V, that would be enough for the first place :)

Problem E mostly relied on a well-known theorem, but had an interesting twist on top of it: you are given a permutation of size n <= 100000, and you need to split it into subsequences such that each subsequence is either monotonically decreasing or monotonically increasing. The number of subsequences needs to be small, but does not need to be minimal. More precisely, your solution should have the optimal worst case performance for any given n: the number of subsequences should not exceed the maximum number of subsequences necessary for any permutation of size n.

Codeforces Round 530 took place one day later (problems, results, top 5 on the left, my screencast, analysis so far partially in Russian). Problem E required a lot of careful implementation, and the speed of that implementation was the deciding factor in the standings. Of the four contestants finishing before the round ended, Radewoosh was the fastest. Well done!

I found problem B quite cute: you are given a nxm matrix such that n*m<=300000, with one character from the set A, C, G or T in each cell. You need to change as few characters as possible (to other characters from the set) in such a way that each 2x2 submatrix has all four different characters. Can you see a way?

In my previous summary, I've mentioned a couple of problems. The first one came from AtCoder: you are given a number k which is at most 1000. You need to come up with any nxn toroidal grid (the first row is adjacent to the last row, and the first column is adjacent to the last column), where you choose the number n but it must be at most 500, such that each cell of the grid is colored with one of the k colors, each color is used at least once, and for all pairs of colors i and j all cells with color i must have the same number of neighbors with color j.

During the round, I could only come up with approaches that produce a number of colors that is either smaller than n, or divisible by n. At some point I had an idea to write a backtracking solution that would find me any solution that does not have these properties, hoping that would help come up with its generalization. In retrospect, that might have done it, as the following approach (which seems to be the only one that works) does look recognizable from one example: let's pick an even n, and split the grid into n (toroidal) diagonals. For each diagonal, we either color it with one color, or with two alternating colors, thus making it possible to get any number of colors between n and 2n. Since each element of a diagonal has exactly two neighbors from each neighboring diagonal, the required properties hold.

The other problem came from Codeforces. I've cited a simplified version of the statement: there is a pile with n<=200000 stones, and two players are playing Nim with a fixed set of possible moves a1, a2, ..., ak: in each turn a player can take a number of stones that is equal to one of the ai, and the player who can't make a move loses. Your goal is to find the nimbers for all values of n between 1 and 200000.

I have forgot to mention one important additional property in this simplification: that the nimbers are guaranteed to be not too big (below 100), which is actually important for the solution below to be fast — sorry for that!

Let's put all possible moves in one bitmask m, and also store a bitmask for each nimber value that represents which pile sizes have a move to a position with that nimber value. Those bitmasks start as empty. In order to determine the nimber for the next pile size, we go through those bitmasks until we find one that has a zero in the corresponding position. Then we need to update the bitmasks for higher pile sizes, and the key trick is that we only need to update one of them: the one corresponding to the newly determined nimber, and the update is simply applying bitwise or with the move bitmask m shifted left by the current pile size. This means that the solution runs in O(n2/64+n*max_nimber) (I know, this is not a perfect use of the O-notation, but I hope you understand what I mean), which is fast enough.

Thanks for reading, and check back next week!

And the best problem of 2018 is...

According to the vote, the best problem of 2018 is the Open Cup problem about rolling a ball in a maze while collecting all targets in some order, by jh05013, with solution in this post. My vote also went to this problem.

Here's its statement in an abridged form once again: you are given a 50x50 grid where some cells are empty, and some are walls. There are also walls surrounding the grid. Some empty cells are marked as targets, and there's also a ball in one of the empty cells. You can move the ball in each of the four cardinal directions, but once it starts rolling, it rolls in the same direction until it hits the wall, at which point you can pick a new direction, and so on. Can you roll the ball in such a way that it passes through all targets in some order?

Congratulations to jh05013, to the Open Cup, and thanks to all problemsetters of 2018!

More precisely, since there were only 91 people voting, I'd say we're about 60% confident that the best problem of 2018 is that one :) Here's the Colab where I try estimate that confidence. Of course that number is meaningless without explaining the assumed underlying model yada yada, but I hope it's good enough for a ballpark estimate. If people who actually, unlike myself, understand how statistics and polling work are reading this: is that a good way to get confidence numbers for a poll? What is a better way?

Thanks for reading, and check back later today for this week's summary!

Wednesday, January 2, 2019

Best problems of 2018

Just like last year, I have reviewed the problems I've highlighted in this blog, and have picked the shortlist of five problems that I think were the best in 2018 (in chronological order):
Which one do you think is the very best?


There are other discussions and votes about the best problem of 2018, check those out :) In that second post, snarknews is also announcing the Prime New Year contest, which consists of problems given at top-level team competitions in 2018 but not solved by anybody there. If you like real challenge and don't have much to do during the holidays, here's your link.

Happy New Year!

Sunday, December 30, 2018

A probably prime week

This week had one contest from each of the major platforms to wrap up 2018. First off, TopCoder held an experimental SRM 745 on Friday (problems, results, top 5 on the left, analysis). I did not participate myself, but I'd guess that solving 6 problems in 75 minutes felt a bit like SnarkNews Winter/Summer Series. The change of format did not stop Stonefeang from continuing his string of excellent results and winning this round. Well done!

AtCoder held its last contest of the year (and thus the last contest included in the onsite finals selection) on Saturday, the Grand Contest 030 (problems, results, top 5 on the left, my screencast, analysisonsite finals selection standings, top 8 below on the right). Reading all problems really paid off in this round, as different contestants found different problems the hardest. For me problems B and C were the most difficult — in fact, I could not come up with a solution for any of them. It seems that the top two contestants cospleermusora and tourist had a similar story with problem C, but since it was worth less than problems E and F they ended up on top of the contestants with a different set of 5 problems solved. Congratulations to cospleermusora on the win!

Here is that problem C that was obvious for some and impossible to get for others: you are given a number k which is at most 1000. You need to come up with any nxn toroidal grid (the first row is adjacent to the last row, and the first column is adjacent to the last column), where you choose the number n but it must be at most 500, such that each cell of the grid is colored with one of the k colors, each color is used at least once, and for all pairs of colors i and j all cells with color i must have the same number of neighbors with color j. Can you see a way?

Codeforces held the Good Bye 2018 round on Sunday (problems, results, top 5 on the left, my screencast, analysis). tourist had one more great performance, going through the problems in order and finishing all of them more than 30 minutes faster than the only other contestant to solve everything, MakeRussiaGreatAgain. Congratulations to both!

My round was going pretty well right up to the point when I googled the main idea of the solution for problem G, but then made a bug in the implementation (I forgot that the square roots are always returned modulo n, and not modulo the "remaining" number that I still need to factorize), and instead of looking for it I assumed there's a problem with BigInteger.isProbablePrime in Codeforces and tried to work around it for quite some time. I've found this post with no answers and the post linked from it, and the fact that some of my submissions were getting "Denial of judgement" strongly supported the isProbablePrime failure theory.

Problem H had a pretty convoluted statement which boiled down to: there is a pile with n<=200000 stones, and two players are playing Nim with a fixed set of possible moves a1, a2, ..., ak: in each turn a player can take a number of stones that is equal to one of the ai, and the player who can't make a move loses. Your goal is to find the nimbers for all values of n between 1 and 200000.

I've tried to solve this problem at the end of the contest, but my mental state after fighting with G for an hour was not ideal :) I realized that I needed to speed up the standard quadratic approach 64 times using bitsets, but could not find a good way to do that. Can you see a way?

Thanks for reading, and check back [hopefully] tomorrow for my take on the best problems of 2018!

Saturday, December 29, 2018

A wave week

The Dec 17 - Dec 23 week contained a surprise extra Open Cup round: Open Cup 2018-19 Grand Prix of Xi'An (results, top 5 on the left, official round results, overall Open Cup standings so far). The Moscow SU team solved three problems in the last hour, including problem A which was solved by only three teams out of 150 attempts, and ended up winning by a problem. Congratulations on the great performance!

Codeforces Round 528 followed on Sunday evening (problems, results, top 5 on the left, analysis). It was mnbvmar's turn to be the only contestant to solve the hardest problem F, and thus win with a huge margin. Congratulations on the victory!

In my previous summary, I have mentioned several problems. The first one came from TopCoder: you are given a rooted tree with 100000 vertices. Each vertex has a demand di and a cost ci. Your goal is to assign some non-negative value vi to each vertex so that the demand of each vertex is less than or equal to the sum of values on the path from this vertex to the root of the tree, and the sum of ci*vi is minimized.

Let's look at the top-most vertices with non-zero value in the solution (such that all their parents and ancestors have zero value). Those vertices are not parents of one another, and thus they satisfy exactly one unit of demand of themselves and their descendants. If we now reduce their value by 1 and repeat, we can split into our solution into "waves" each wave satisfying one unit of demand in all remaining vertices with non-zero demand.

We can also notice that each wave can be constructed as follows: first, start with a wave that contains only the root vertex. Then repeatedly do the following: take any vertex with zero demand that is in the wave, remove it from the wave but add all its children. Each such operation changes the cost of the wave by the sum of costs of the children minus the cost of the removed vertex. Let's call this value the delta of the removed vertex.

Now we'll build the waves one by one while always maintaining the minimum cost wave. Initially such wave is just a root vertex. As soon as the number of waves we've built reaches the demand of a vertex, this vertex becomes available for replacement, which would change the weight of the wave by its delta. If the delta is positive, we don't need to do anything. If the delta is negative, and the vertex is in the minimum cost wave, then we need to do the replacement. If the delta is negative but the vertex is not yet in the minimum cost wave, we will need to do its replacement as soon as it gets into the wave, in other word we should add its delta to the delta of its parent. If the delta of the parent becomes negative, then we propagate it to its parent, and so on, so that at every moment we have only the optimal wave plus some non-negative deltas remaining.

In order for this to run fast, we need to directly propagate to the closest non-zero ancestor instead of going through all parents in the chain to it, which can be accomplished by keeping a disjoint-set union data structure where each set has a vertex with non-zero delta as its representative, and contains all vertices with zero deltas which have it as the closest non-zero ancestor.

This was quite tricky to come up with, but the implementation was rather simple and without any corner cases or data structures more complex than disjoint-set union.

The second problem I mentioned came from AtCoder: you are given n vertices, and n-1 sets of those vertices, each of size at least 2. Your goal is to pick exactly two vertices in each of the n-1 sets in such a way that if we draw the n-1 edges connecting each chosen pair to each other, we get a tree. n is at most 100000, and the total size of the given sets is at most 200000.

The first step is to think when is this actually impossible. The sample cases provided a helpful example: there was a case where there were two vertices that belonged to just one set, and moreover to the same set. Since only one edge will come out of this set, we will not be able to connect both of those vertices to the rest of the graph.

This obstacle can be generalized as follows: if there's a set of vertices of size k which is less than n such that the number of input sets containing at least one of those vertices is less than k, there's no solution.

But wait, we have seen such obstacles before! Consider the bipartite graph where the left part is the input vertices and the right part is the input sets, with edges connecting each set with its members. If there's no obstacle as described above, then by Hall's theorem in this graph there's a matching covering any (n-1)-element subset of the left part.

Let's find any such matching, it covers some (n-1)-element subset of vertices. The fact that all other (n-1)-element subsets can also be covered by a matching means that if we run one more step of the maximum matching algorithm, it must find augmenting paths from the only remaining uncovered vertex to all covered vertices (so that flipping this augmenting path would result in the solution for the corresponding (n-1)-element subset). These augmenting paths form a subtree of the residual network.

Now we can see that if we take any vertex covered by matching, its previous vertex in the augmenting path tree (which will be in the right part, so represent a set), and the set's previous vertex in the augmenting path tree (which will be in the left part, so represent a vertex), then we get two vertices belonging to the corresponding set, and those n-1 pairs are exactly the solution to our problem as they necessarily form a tree!

Finally, there was an Open Cup problem: two players are playing a game on a rooted tree with 100000 vertices. They make moves in turn. In each move of the first player, she colors one of the leaves of the tree black (initially all leaves are colored white). The root of the tree also initially contains a token. In each move of the second player, she can move the token from a vertex to any adjacent vertex. The second player wins if she moves the token to a leaf that is still white. The first player wins when all leaves are colored black and the second player has not yet won. Who will win if both players play optimally?

First, we can notice that it doesn't make sense for the second player to go up, as they will end up in a strictly worse state than before. So for each subtree, there's at most one moment in the game when the token enters this subtree, and then the game continues just in this subtree.

Moreover, suppose that the first player has made x moves into this subtree before the token enters it. Then it's clear that there's some boundary b such that when x<b the second player wins if the token enters this subtree, and when x>=the first player wins if the token enters this subtree. This finally gives us a linear-time dynamic programming solution: we will compute this boundary b for each subtree going from the bottom to the top.

Thanks for reading, and check back tomorrow!

A Kuhn's week

The Dec 10 - Dec 16 week was livelier than a few previous ones. Codeforces Round 526 started the fun right on Monday (problems, results, top 5 on the left, analysis). Radewoosh continued his string of excellent results, was the only contestant to solve five problems, got the first place with a huge margin and also overtook tourist for the first place in the rating list — very well done!

TopCoder SRM 744 took place on Friday (problems, results, top 5 on the left, my screencast, analysis). Trying to learn from my experience in the TCO final, when approaching the hard problem I have decided to not implement relatively straightforward solutions involving either heavy-light decomposition or propagating piecewise-linear functions up the tree, but instead think a bit more to come up with an easier to implement solution. The strategy has almost backfired as I got my "simple" solution to work with less than two minutes into the contest. However, it did work in the end as a few other contestants going for the straightforward but complicated implementation approaches have failed because of subtle bugs (for example, ainta was above me after the coding phase but his solution failed as he should've used a multiset instead of a set, presumably inside the representation of a piecewise-linear function or something similar).

Here's that problem: you are given a rooted tree with 100000 vertices. Each vertex has a demand di and a cost ci. Your goal is to assign some non-negative value vi to each vertex so that the demand of each vertex is less than or equal to the sum of values on the path from this vertex to the root of the tree, and the sum of ci*vi is minimized. What's the easiest to implement approach, in your opinion?

AtCoder Grand Contest 029 followed on Saturday (problems, results, top 5 on the left, my screencast with commentary, analysis). The problemset was relatively approachable this time, so to win one had to be fast and to minimize incorrect attempts. LHiC executed very well on both fronts and got first place with the margin of 12 penalty minutes. Well done!

I was submitting my last problem around 88-th minute for the first time, but got time limit exceeded on just one testcase. My solution involved Dinic's maximum flow algorithm that I have copy-pasted from a previous solution. Later I submitted the same solution in C++ and it passed in just 0.5s, while the time limit is 4s. Maybe somebody can tell why Java is more than 8x slower this time?

Of course, later I tried submitting Kuhn's maximum matching algorithm in Java instead, which is supposed to be quadratic in the worst case, but it actually passed within the time limit :)

Reducing the problem to maximum matching is also quite fun. Here's the statement: you are given n vertices, and n-1 sets of those vertices, each of size at least 2. Your goal is to pick exactly two vertices in each of the n-1 sets in such a way that if we draw the n-1 edges connecting each chosen pair to each other, we get a tree. n is at most 100000, and the total size of the given sets is at most 200000.

Open Cup 2018-19 Grand Prix of Peterhof took place on Sunday (results, top 5 on the left). Compared to a few previous rounds, the Moscow SU team has cut down dramatically on incorrect attempts, and thus got their first win of the season. Congratulations!

Problem G was a nice dynamic programming exercise. Two players are playing a game on a rooted tree with 100000 vertices. They make moves in turn. In each move of the first player, she colors one of the leaves of the tree black (initially all leaves are colored white). The root of the tree also initially contains a token. In each move of the second player, she can move the token from a vertex to any adjacent vertex. The second player wins if she moves the token to a leaf that is still white. The first player wins when all leaves are colored black and the second player has not yet won. Who will win if both players play optimally? Can you see a way to avoid traversing the entire ~2100000 state space?

Finally, Codeforces ran Avito Cool Challenge 2018 later on Sunday (problems, results, top 5 on the left, analysis). tourist was a bit slower than LHiC but was able to find two challenges and regain the first place in the round, and with it the first place in the rating list. Congratulations!

Thanks for reading, and check back for more!