Algorithms Weekly by Petr Mitrichev: February 2016

Sunday, February 21, 2016

A vertex-dominating week

The Feb 8 - Feb 14 week featured the 8VC Venture Cup 2016 Elimination Round by Codeforces (problems, results, top 5 on the left). With 7 problems there was plenty to choose from, and I think that problem E was the most difficult algorithmically, despite not being valued the highest. You are given a set with 200000 numbers, and need to find the subset of that set where the difference between the mean and the median is the highest. At the first glance, it seems that either some greedy could work, or the problem would require at least O(n²) to solve. However, it turns out neither is true!

Open Cup 2015-2016 Grand Prix of China (results, top 5 on the left). Problem B had a very simple statement and yet an interesting solution: given a bipartite graph, find how many subsets of its vertices are vertex-dominating. A subset of vertices is vertex-dominating if any vertex of the graph either lies on this subset, or has a neighbor in this subset, or both. It is of course the constraints that make this problem interesting. The Open Cup problem told that the graph has at most 30 vertices, but I encourage you to solve a bit tougher problem: let's say the number of vertices is at most 45.

In my previous post, I've mentioned two interesting problems. The first one was about processing a sequence of 10 million numbers using just 1 MB of memory. To be more precise, you're given a way to generate the sequence: start with a given number X₀, then repeatedly compute X_i = ((X_i-1 xor A) + B) mod 2⁵⁰, where A and B are also given. You're looking to find the radius of each element in the sequence, defined as the largest number k such that the previous k elements and the next k elements are strictly less than the given element, and return the sum of the radii for all elements.

The were two key observations required to solve this problem. First, one needed to pay close attention to the way the sequence is generated. In most cases such random generators are given in problem statements simply to avoid handling large input data. However, in this case the generator had one additional important property: it was reversible. Not only can we generate the number that follows the given number, we can also find which number precedes the given number!

The second observation is the fact that the answer can't be very big. More precisely, for a sequence of length N it's O(N*logN): it's not hard to see that the elements that cover the given element with their radii lie exponentially further away. Together with the first observation, we can now write an extremely simple solution: just iterate over the sequence, and find the radius for each element in the most straightforward way, using the ability to generate the next and previous elements! This solution runs in O(N*logN) and needs only O(1) memory.

The other interesting problem was about speeding up exponential dynamic programming. You're given a 10x10 chessboard with some cells removed. You need to find any connected set of cells on it that has exactly W white cells and exactly B black cells, or report that there isn't any.

One could readily imagine a dynamic programming solution that goes row by row, remembering how many white cells we've taken so far, how many black cells we've taken so far, which cells are taken amount the last jagged row, and which connected components they form. However, such DP has too many states (roughly 10000 states of the jagged row with connected components, times 100 steps, times 50 white cells, times 50 black cells equals 2.5 billion, and although not all those states are reachable, a big fraction of them are) and will never run in time.

In order to speed it up, instead of remembering the number of white and black cells taken, we will find the maximum number of black cells for each number of white cells. This reduces the number of states by the factor of 50, and the solution fits into the time limit now. However, this doesn't give us the answer we need: in the end we need a concrete number of black cells, and we'll only know the upper bound.

Here comes the main idea, which in my opinion is amazing. If we have a connected set with a given number of white cells, but with more black cells than we need, we can start removing black cells while keeping the set connected. We can't do this forever, though - at some point removing any black cell will cause the set to become disconnected. However, we can notice that in such state the number of white cells is always strictly greater than the number of black cells! In other words, this simple solution will work just fine if the number B of black cells we need is >= the number W of white cells we need. And what to do if it's less? Well, we can swap the colors then and apply the same solution!

Thanks for reading, and check back soon for the last week's summary!

Monday, February 8, 2016

A Saratov week

Codeforces has hosted yet another company-sponsored round last week, AIM Tech Round (problems, results, top 5 on the left). scott_wu has demonstrated an amazing bug-finding performance by challenging six other solutions, but in the end just his problem-solving speed alone would be enough for the victory. Congratulations!

TopCoder SRM 681 on Saturday (problems, results, top 5 on the left) has provided the competitors with an unorthodox medium problem. I had no clue how to solve it, so hats off to kcm1700, the only competitor to solve all three problems and claim a well-deserved victory!

Here's what that problem was about. You are given a sequence of 10 million numbers. To be more precise, you're given a way to generate it: start with a given number X₀, then repeatedly compute X_i= ((X_i-1 xor A) + B) mod 2⁵⁰, where A and B are also given. You're looking to find the radius of each element in the sequence, defined as the largest number k such that the previous k elements and the next k elements are strictly less than the given element, and return the sum of the radii for all elements. The catch is, you have to do it while using at most 1 MB of memory - so you can only store about 1% of the sequence in memory at the same time!

Open Cup 2015-16 Grand Prix of Saratov (problems, results, top 5 on the left) has capped the week with another very nice problemset. Problem C, which we couldn't solve during the contest, turned out to have a very beautiful solution - so kudos to Past Glory team on solving it on the way to the first place!

The said problem looked somewhat standard on the surface. You're given a 10x10 chessboard with some cells removed. You need to find any connected set of cells on it that has exactly W white cells and exactly B black cells, or report that there isn't any. One could readily imagine a dynamic programming solution that goes row by row, remembering how many white cells we've taken so far, how many black cells we've taken so far, which cells are taken amount the last jagged row, and which connected components they form. However, such DP has too many states (roughly 10000 states of the jagged row with connected components, times 100 steps, times 50 white cells, times 50 black cells equals 2.5 billion, and although not all those states are reachable, a big fraction of them are) and will never run in time. So how does one solve this problem?

As requested in comments to my previous post, let's talk a bit more about one of the boomerang problems. You were given a 100x100 grid with each cell containing a boomerang pointing to two adjacent sides of this cell. When two boomerangs point towards one another, we call it a conflict. We can rotate a boomerang by 90 degrees in either direction at most K times (K <= 1000, we can pick K different boomerangs for rotation, or maybe rotate some boomerang several times). What is the smallest number of conflicts we can have after at most K rotations by 90 degrees?

The key idea in solving this problem is that resolving horizontal conflicts and resolving vertical conflicts can be done independently. It's not clear from the first sight, since a boomerang is "connected", and thus when rotating it we're affecting both its horizontal and its vertical conflicts. However, we can notice that rotating a boomerang by 90 degrees causes one of the two things: its horizontal hand starts pointing in the other direction, or its vertical hand starts pointing in the other direction. Moreover, no matter which position the boomerang is in, we can always achieve both those outcomes by a 90 degree rotation! In other words, one rotation can change horizontal or vertical situation, but not both, and that's why horizontal and vertical conflicts are resolved independently.

Having noticed this, the problem breaks into even smaller pieces: it's somewhat obvious now that each row's and each column's conflicts can also be resolved independently. So instead of one monolithic 100x100 2D problem, now we have 200 separate 1x100 1D problems. Each 1D problem is solved with a somewhat classical O(N²) dynamic programming approach which finds what's the largest number of conflicts we can resolve using A moves in the first B cells, leaving the last cell pointing in a given direction. Each subproblem's output is the largest number of conflicts that can be resolved using 1, 2, ..., 100 moves. Now we need to combine those outputs into the overall solution, which is once again a standard dynamic programming question: what's the largest number of conflicts we can resolve using A moves in the first B subproblems. The overall running time of this solution is O(N³).

Thanks for reading, and check back next week!

Saturday, February 6, 2016

A week with boomerangs

The end of Jan 25 - Jan 31 week was tightly packed with contests.

First off, TopCoder SRM 680 took place on Thursday evening (problems, results, top 5 on the left). tourist was once again the only contestant to solve the hard problem, and thus enjoyed a 400+-point victory margin - great job! It was especially impressive given that at the end of the coding phase there were 23 submissions for this problem, and one could not suspect all but one would fail.

Here's the tricky problem: you have a string with 2500 characters, each character being one of the 6 letters from 'a' to 'f', and a number k up to 6 of the same parity as n. You need to select k of the string's characters, and split the remaining (n-k) characters into (n-k)/2 pairs, in such a way that each pair has distinct characters. Pairing character i and character j together costs c[i] + c[j] + 100 * abs(i - j), where c is a given array. What is the minimum total cost to build all those pairs? The statement is a bit convoluted, but once you wrap your head around it, the problem itself is quite interesting.

Wunder Fund Round 2016 happened on Codeforces about one day later (problems, results, top 5 on the left). Egor has managed to pull out an amazing last-minute victory, submitting problem F on the last minute - and getting it pass the system test.

That problem is similar to the ones from mathematical competitions. You are given two multisets of n integers each, each integer between 1 and n inclusive. Find a non-empty subset of the first multiset and a non-empty subset of the second multiset, such that they have the same sum.

Come the next evening - and there's one more competition, this time the final elimination round of the Facebook Hacker Cup, with 25 spots in the onsite competition in London up for grabs (problems with Facebook login, results with Facebook login, top 5 on the left). The difficulty of the problemset was chosen very appropriately, and the contest was mostly about solving difficult problems instead of speed coding, which is awesome. Bruce has solved 4 problems with the smallest penalty time, and will be one of the favorites to win the final round!

I think the second problem was the most beautiful one in this round (picture on the left from the official website). You were given a 100x100 grid with each cell containing a boomerang pointing to two adjacent sides of this cell. When two boomerangs point towards one another, we call it a conflict. We can rotate a boomerang by 90 degrees in either direction at most K times (K <= 1000, we can pick K different boomerangs for rotation, or maybe rotate some boomerang several times). What is the smallest number of conflicts we can have after at most K rotations by 90 degrees?

Finally, the Open Cup returned to action for the first time in 2016 on Sunday with the Grand Prix of Asia (problems, results, top 5 on the left). In line with their awesome performance at the ongoing Petrozavodsk training camp, the Warsaw Eagles team won this contest as well - congratulations!

This round had so many interesting and beautiful problems, that I will highlight two. First, problem D asked you to count the number of ways to interleave two given permutations of size N (two ways are considered different if the resulting sequences of 2N numbers are different). N is up to 2000, and you need to find the answer modulo 10⁹+7. For example, if the given permutations are "1, 2" and "2, 1", then there are four such ways: "1, 2, 2, 1", "1, 2, 1, 2", "2, 1, 1, 2" and "2, 1, 2, 1".

Problem H was concerned with random walks on a 2D grid. You start from cell (0, 0), and make N random independent steps, moving in one of the four cardinal directions at each step. What is the expected number of cells you will visit? N is up to 5000.

That was one hell of a problem-packed post :) I hope you enjoy solving them!

A week without FFT

The Jan 18 - Jan 24 week started with TopCoder SRM 679 on Wednesday (problems, results, top 5 on the left). The hard problem in this round is somewhat weird: the solution is quite straightforward, but somehow a few people, including myself, missed it and tried to squeeze in an incorrect FFT-based solution that obviously times out. It didn't stop tourist, who won the round with an impressive 350+-point margin.

Here's how the problem went: you were given 500 bags, each containing a lot of cards, with each card containing an integer between 1 and 500 (more precisely, you have a 500x500 matrix, describing how many cards contain each integer in each bag). You're also given a set of "good" integers, each between 1 and 1000. For each pair of bags, you need to answer the following question: how many ways are there to pick a card from the first bag and a card from the second bag in such a way that the sum of their numbers is a good number?

The correct solution runs in O(n^3), while the somewhat obvious FFT-based approach is only O(n^3*logn), and thus too slow. Can you see the O(n^3) one?

Facebook Hacker Cup selection continued with Round 2 on Saturday (problems with Facebook login, results with Facebook login, top 5 on the left). This was the first round where the time of submitting one's solutions was taken into account, and thus the first round where the contestants were competing against each other, not just against the problems. Eryx has claimed the first place with a solid margin - congratulations!

The problems were quite technical again, but the hardest problem at least allowed one to simplify the solution using a creative idea. You were given a tree with 1000 nodes, and needed to paint it with 30 colors, with the cost of painting each node with each colors given as a matrix. So far, the solution is very straightforward - just pick the cheapest color for each node. However, there was a small twist: you also paid a given penalty P for each node that has neighbors of the same color. You pay P only once per such node, no matter how many neighbors of the same color does it have. Can you solve this problem in O(N*K⁴) (N=1000, K=30)? How about O(N*K³)?

Let me also explain the solution to a problem from the previous post: you were given a tree with 100000 vertices, and at most 100000 queries. Each query highlighted some subset of vertices, and to answer a query you needed to compute the minimum number of non-highlighted vertices to remove that would leave all highlighted vertices disconnected, or to report there isn’t a way (in case two highlighted vertices are directly connected with an edge). The total number of highlighted vertices in all queries also doesn’t exceed 100000.

First, let's learn to solve the problem for just one query. It's not hard to see that this can be done greedily: we go from leaves to the root, returning a boolean value from the subtree: does this subtree contain a highlighted vertex? When a vertex has more than one subtree containing a highlighted vertex, or when its parent is highlighted and it has at least one highlighted child, we need to remove it.

Given that our solution returns just one boolean value for each subtree, we can handle 64 queries at once by returning a bitmask from it, using bitwise operations to implement the logic described above. This is (arguably?) not an asymptotic optimization, but 100000²/64 for this to pass under the time limit.

Thanks for reading, and check back soon for the summary of the last week of January!

An Ewok week

The Jan 11 – Jan 17 week saw the return of the main algorithmic competition anchors in 2016: TopCoder and Codeforces.

A Star Wars-themed TopCoder SRM 678 happened very early on Wednesday (problems, results, top 5 on the left). Congratulations to freak93 on the victory and on being the only contestant to solve the hard problem correctly! Apparently the Ewoks are really good at algorithms, as only one human could solve their problem :)

Codeforces Round 339 took place at the usual Codeforces time on Thursday evening (problems, results, top 5 on the left). All problems were somewhat technical, but at the same time allowed one to demonstrate algorithmic competition experience. Problem D, for example, tested one’s skills in batch processing of requests on a tree. You were given a tree with 100000 vertices, and at most 100000 queries. Each query highlighted some subset of vertices, and to answer a query you needed to compute the minimum number of non-highlighted vertices to remove that would leave all highlighted vertices disconnected, or to report there isn’t a way (in case two highlighted vertices are directly connected with an edge). The total number of highlighted vertices in all queries also doesn’t exceed 100000. Such problems usually allow O(n^2), O(N*sqrt(n) and O(n*polylog(n)) solutions, with the former supposed not to run in time. However, in this particular case a O(n^2) solution could pass if optimized using bitmasks – can you see how?

Facebook Hacker Cup Round 1 on the weekend gave one a chance to get closer to the first onsite competition of the year, to happen in London in March (problems with Facebook login, results with Facebook login, top 5 on the left). The round also featured somewhat standard technical problems. The hardest problem was perhaps the least technical: you were given 16 players, a 16x16 matrix of who wins whom, and needed to find the lowest and the highest possible elimination round for each player in an Olympic elimination tournament with 4 rounds, over all possible seedings.

I’ve described a New Year problem last week: you are given a positive integer A with at most 1000 digits. Find any two positive integers X and Y such that X+Y=A, and both X and Y are palindromes when written in decimal notation without leading zeroes.

In order to solve this problem, let’s first assume there’s no carry – each digit of A is formed directly as the sum of corresponding digits of X and Y, which is always less than 10. In case X and Y have different lengths, there’s at most one solution with given lengths of X and Y and it’s easy to find. Without loss of generality, assume X has more digits than Y. The first digit of X is then equal to the first digit of A (remember, there’s no carry yet!). Since X is a palindrome, we know its last digit as well. But then we can find the last digit of Y by subtraction. And then we can find the second digit of X, and so on. In case they have equal lengths, there might be a whole lot of solutions, but all of them are equivalent in some sense – in effect, we know the sum of corresponding digits of X and Y, and have several ways to decompose each sum into two summands independently. For example, 4=1+3=2+2, so 44=11+33=22+22=12+32=21+23.

Now how does one handle the carry? First of all, notice that for the last digits, we can determine that carry as we go from right to left. For the first digits, however, we go from left to right, and thus need to know the carry from the next digit to determine a given digit by subtraction. When we don’t know it, we will do backtracking – try both a carry of 0 and a carry of 1 recursively. At the first glance, this would seem exponential, but after taking some time to contemplate, one can see that most branches would be truncated very early because some digit would get negative or more than 9, and in fact this backtracking runs very fast.

Thanks for reading, and check back soon for the next week’s problems!