Sunday, February 22, 2015

This week in competitive programming

Codeforces Round 292 took place late on Tuesday (problems, results, top 5 on the left). Haghani has pulled off an impressive victory given that he was not the rating favorite - but of course he did climb much higher in the ratings as a result, now he's ranked 28th. Congratulations! rng_58, on the other hand, could've probably been the rating favorite if not for his recent strategy of solving the hardest problem first. Once again he became the only person to solve the said hardest problem, but couldn't solve two other problems in time and only placed 10th.

TopCoder SRM 650 started next morning less than 8 hours after the Codeforces round ended (problems, results, top 5 on the left), so big congratulations to Endagorion on winning it in addition to placing 3rd earlier, and for being the only one to solve the hard problem!

There were also four Open Cup rounds in January and February. The problems from those rounds are not published yet, but it's been a month now and it's boring to wait longer, so I will try to explain the problem statements in full below.

Open Cup 2014-15 Grand Prix of Japan took place three weeks ago (results, top 5 on the left). Problem I was, on the surface, a repeat of a problem from Petr Mitrichev Contest 9 back from 2011: it simply asked to count the number of pairs of strings (s, t) such that the length of s is N, the length of t is M, each character is from an alphabet of size A, and t is a substring of s. When this problem was given on my contest, N, M and A were up to 100, and you needed to output a floating-point number - the probability that a random pair has this property. No team could solve this problem during the contest - take a look at problem H in standings. The reference solution made heavy use of the fact that we need to output a fraction and are allowed 10-9 error, since for longer substrings the probability is almost zero, and for shorter substrings we can iterate over all possible prefix functions of the substring, and when the prefix function is fixed counting superstrings is a relatively straightforward dynamic programming on the Knuth-Morris-Pratt automaton.

Fast forward 3.5 years, and now the harder version is given on a contest: now N is up to 200, M is up to 50, A is up to 1000, and you need to find the answer modulo 109+7, so you can't squeeze by with approximation arguments anymore. And one team does solve this problem - congratulations to Alex, Slava and Artem!

The key to solving the harder version is to notice that we don't, in fact, need the entire prefix function of the substring to count the number of superstrings containing it: it turns out that we can just look at the set of prefixes of the substring that are also its suffixes, in other words at the set of its borders. Can you see why the borders are enough?

Thanks for reading, and check back next week for the full explanation of this problem's solution, and for a new hard problem!

Thursday, February 19, 2015

This week in competitive programming

Theorem (Generalized Cayley's formula?).

Given a graph with n labeled vertices, let s be the number of its connected components and m1,m2,...,ms be the numbers of vertices in those components. Then the number of ways to add s-1 edges to this graph so that it becomes connected is ns-2∏mi.

When Mikhail `Endagorion' Tikhomirov told me this theorem, I was very suprised that it looks so simple yet I've never encountered it before. Is it a well-known result? Does it have a name?

This theorem comes in handy when solving a problem from previous week's summary: how many simple undirected graphs are there with n labeleld vertices and k bridges?

Here's how we can solve this problem: in addition to the numbera an,k of graphs with n vertices and k bridges, we'll also count the number bn,k of connected graphs with n vertices and k bridges (including, as k=0 case, the number of biconnected graphs), the value cn,k which is the sum over all graphs with n vertices and k connected components, such that all its connected components are biconnected, of the products of sizes of the components, and finally the number dn of connected graphs with n vertices.

Let's start with bn,k with positive k. Since the graph has at least one bridge, the bridges split it into several biconnected components, and the Generalized Cayley's formula mentioned above helps count the ways to join those biconnected components with bridges into a connected graph, so we get bn,k=nk-1*cn,k+1. Now bn,0 is simply dn-sum(bn,k).

To find cn,k, we first notice that cn,1=n*bn,0. For larger k, we should consider how many vertices are there in the biconnected component the contains the first vertex - let's say it has m vertices, then we have to multiply the number of ways to pick the remaining m-1 vertex in the component by the numbe of ways to pick the remaining components. In other words, cn,k=sum(comb(n-1,m-1)*cm,1*cn-m,k-1).

Finding dn is a standard question that is solved in a similar way: connected graphs are all graphs minus disconnected graphs, and disconnected graphs can be categorized by the size of the connected component containing the first vertex. In other words, dn=2n*(n-1)/2-sum(comb(n-1,m-1)*dm*2(n-m)*(n-m-1)/2).

And finally, we can find the value that we need: an,k. Again, let's look at the connected component containing the first vertex, and suppose it has m vertices and p bridges. Then we get an,k=sum(comb(n-1,m-1)*bm,p*an-m,k-p).

We compute O(n2) values, spending O(n) for each value in b, c and d, but O(n2) for each value of a, so the total running time is O(n4).

Here's the code:
public class Fragile {
    static final long MODULO = (long) (1e9 + 7);

    public int countGraphs(int N, int K) {
        long[][] a = new long[N + 1][N + 1];
        long[][] b = new long[N + 1][N + 1];
        long[][] c = new long[N + 1][N + 1];
        long[] d = new long[N + 1];
        d[0] = 1;
        long[][] comb = new long[N + 1][N + 1];
        comb[0][0] = 1;
        a[0][0] = 1;
        long[] p2 = new long[N * (N - 1) / 2 + 1];
        p2[0] = 1;
        for (int i = 1; i < p2.length; ++i) {
            p2[i] = 2 * p2[i - 1] % MODULO;
        }

        for (int n = 1; n <= N; ++n) {
            comb[n][0] = 1;
            for (int k = 1; k <= N; ++k) {
                comb[n][k] = (comb[n - 1][k - 1] + comb[n - 1][k]) % MODULO;
            }
            d[n] = p2[n * (n - 1) / 2];
            for (int m = 1; m < n; ++m) {
                d[n] = (d[n] - comb[n - 1][m - 1] * d[m] % MODULO * p2[(n - m) * (n - m - 1) / 2] % MODULO + MODULO) % MODULO;
            }
            for (int k = 2; k <= n; ++k) {
                for (int m = 1; m < n; ++m) {
                    c[n][k] = (c[n][k] + comb[n - 1][m - 1] * c[m][1] % MODULO * c[n - m][k - 1]) % MODULO;
                }
            }
            b[n][0] = d[n];
            long pw = 1;
            for (int k = 1; k < n; ++k) {
                b[n][k] = pw * c[n][k + 1] % MODULO;
                pw = pw * n % MODULO;
                b[n][0] = (b[n][0] - b[n][k] + MODULO) % MODULO;
            }
            c[n][1] = n * b[n][0] % MODULO;
            for (int k = 0; k < n; ++k) {
                for (int m = 1; m <= n; ++m) {
                    for (int p = 0; p <= k; ++p) {
                        a[n][k] = (a[n][k] + comb[n - 1][m - 1] * b[m][p] % MODULO * a[n - m][k - p]) % MODULO;
                    }
                }
            }
        }

        return (int) a[N][K];
    }
}
Do you know a simpler way to solve this problem? Do you know a faster way to solve this problem?

My solution at the contest was even more complicated with O(n4) states and O(n6) running time, so it didn't complete in 2 seconds and I had to precompute the answers and submit those. I've used the following dynamic programming: how many graphs are there with n vertices, m vertices in the biconnected component containing the first vertex, k bridges, and p bridges going out from the biconnected component containing the first vertex.

Now, let's turn to last week's only regular online competition - TopCoder SRM 649 (problems, results, top 5 on the left). Fourteen people solved all three presented problems, but sdya has managed to create a comfortable 100+ point margin with fast solutions and a challenge - congratulations!

Thanks for reading, and check back next week!

Friday, February 13, 2015

This week in competitive programming

TopCoder SRM 648 has started the last week's contests as early as Monday (problems, results, top 5 on the left, my screencast). The hard problem, while innocent on the surface, turned out to be exceptionally tricky, requiring some serious combinatorics and dynamic programming mastery - check it out! The statement is as simple as: how many simple undirected graphs are there with N labeled vertices and K bridges? N is up to 50.

Codeforces Round 290 took place just several hours later (problems, results, top 5 on the left, my screencast). Problem D was another cute combinatorics exercise. Its most interesting part boiled down to: given a tree, how many ways are there to remove its leaves one by one until we remove everything? Of course, new leaves may appear after removing some. Had the tree been rooted, this would be a pretty standard dynamic programming application, but how to deal with the fact that there's no root?

Finally, Rockethon 2015 gave everybody a shot at prizes on Saturday (problems, results, top 5 on the left). Gennady has carefully progressed through all problems, not really leaving anybody else a chance - congratulations!

There were also several Open Cup rounds in the past weeks, and the Petrozavodsk training camp has assembled an unprecedented number of ACM ICPC teams - but as the same problems are still going to be used for other training camps, I will postpone discussing those for now. Stay tuned for some really tough problems in the coming weeks!

Wednesday, February 4, 2015

This week in competitive programming

Facebook Hacker Cup 2015 has completed its weekly routine with Round 3 on Saturday night (problems with Facebook login, results with Facebook login, top 5 on the left, my screencast). Top 25 will travel to the United States for the final round on March 6, and big congratulations to Gleb who claimed the first place in this really competitive round!

Let's also come back to the interesting problem I mentioned in last week's summary: you are given a directed graph where each arc has a cost, one of the vertices is marked as starting point, and one as ending point. You need to pick a subset of arcs with the smallest total cost such that every path from the starting point to the ending point (including paths that pass through the same vertex or arc more than once) passes through arcs from this subset exactly once.

Whenever we need to pick a subset of arcs that splits a graph in two, the theory of flows and cuts comes to mind. However, we're not simply asked to find a minimum cut on a given graph, and not even on its condensation, as illustrated by the following acyclic example:

In this graph, the minimum cut is of size 2 and separates vertices S and A from vertices B and T. However, the path S-B-A-T crosses the cut three times (twice going outside, and once going inside), and we need for each path to cross the cut exactly once. Basically, the minimum cut is solving the "at least once" version.

But the power of the flow/cut theory often lies in applying it to slightly modified graphs. For cuts in particular, adding infinite arcs to the graph is usually the solution of choice. Here our goal is to make sure one can't re-enter the cut after leaving it, so for each arc we'll add a reverse arc with infinite capacity, as illustrated by the following picture:

In particular, the newly added A-B arc of infinite capacity changes the incorrect cut displayed above from capacity 2 to infinite capacity (1+1+infinity), and thus the smallest cut is now of size 3, displayed by the dotted line to the left. There's also another cut of size 3 - just cut off the source vertex S. It's not hard to verify that every path from S to T croses this cut exactly once.

Another great thing about this addition of infinite arcs is that it allows us to skip the condensation step. If there's a cycle somewhere in our graph, we'll thus add a reverse infinite cycle, and thus no minimum cut will ever cross any cycle!

Such observations make this solution "click" when you see it: instead of fighting with the problem, everything flows naturally and the solution makes sense from all angles. I especially value such moments, I think they show the beauty of mathematics and algorithms.

However, everything was not that simple in this problem :) It turns out that in order for this solution to work correctly in all cases, we need to prune the graph before adding the infinite arcs. More specifically, we need to remove all vertices that don't lie on any path from S to T from the graph. Can you see where we went wrong, and why doesn't the cut theory save us in this case?

Thanks for reading, and check back next week!

Sunday, January 25, 2015

This week in competitive programming

Saturday night was the competition time this week, starting with TopCoder SRM 647 (problems, results, top 5 on the left, my screencast). The hard problem had a very simple and interesting statement, might've looked as a relatively straightforward maximum flow application, but it came with an unexpected twist that foiled many experienced competitors. You were given a directed graph where each arc has a cost, one of the vertices is marked as starting point, and one as ending point. You needed to pick a subset of arcs with the smallest total cost such that every path from the starting point to the ending point (including paths that pass through the same vertex or arc more than once) passes through arcs from this subset exactly once.

The idea that we can use minimum cut theory to find this subset looks natural. But just applying minimum cut algorithms on the given graph doesn't fit the bill: it is indeed guaranteed that we will cross every cut at least once when walking from the starting point to the ending point, but we might also cross a cut twice or more, and this is not allowed by the problem statement. The usual recipe in situations like this is adding auxiliary infinite arcs to the graph which will essentially limit the set of cuts we look at, since the minimum cut will obviously contain no infinite arcs. Can you see how to add infinite arcs to this graph to solve the problem?

Just a few hours later Facebook Hacker Cup 2015 Round 2 narrowed the field of competitors to just one hundred algorithmists (problems with Facebook login, results with Facebook login, top 5 on the left, my screencast). Once again, the really punishing scoring model meant making sure that the solutions are correct was more important to qualifying than solving problems fast. Nevertheless congratulations to Ivan on claiming the first place! The competition at the ACM ICPC World Finals is going to be very fierce this year.

 
Somewhat orthogonally, here are a couple of pictures from my recent touristic visit to London. Quite different skies and quite different surroundings, London is full of everything :) Thanks for reading, and check back next week!

Thursday, January 22, 2015

This week in competitive programming

Contrary to last week's serenity, this week has been really busy in terms of contests. Early on Monday, Codeforces Round 285 started the sequence (problems, results, top 5 on the left). Competitors in places from second to fifth remained within one successful challenge from each other, but Boris was not within reach and achieved a commanding victory - congratulations!

TopCoder SRM 645 happened later on Monday (problems, results, top 5 on the left, my screencast). The medium problem required careful logical thinking to apply several relatively easy steps in sequence to obtain a working solution. You were given two sets of points on the plane, of the same size, which was at most 50, and also exactly three magic points on the same plane. In one step, you were allowed to pick one of the three magic points, and rotate all points in the first given set by 180 degrees with the selected magic point as the center of rotation. Note that all points are rotated at the same time and with the same center of rotation. Is it possible to obtain the second set of points via zero or more such steps from the first set of points?

TopCoder SRM 646 continued the fun early on Friday (problems, results, top 5 on the left). Tiancheng continued to show impressive form (previous post) and climbed into the 3rd spot (shared with Tomek) in all-time SRM wins stats - awesome performance, both this week and in the past years!

Facebook Hacker Cup Round 1 occupied 24 hours on the weekend (problems with Facebook loginresults with Facebook login, top 5 on the left, my screencast). Of course, one didn't need to spend that much since the problems were relatively simple. The challenge, as usual with the Hacker Cup, was in the format that punishes every mistake very heavily: since the round lasted for 24 hours, many competitors who would not solve these problems in tighter time constraints got a chance to steadily work towards solving all four, and that, in turn, meant that most likely all 500 qualifying spots would be taken by people who solved everything, and thus almost any mistake could mean a good bye to the Hacker Cup. The cutoff ended up being 75 points, not 100 points, but still mistakes in solving the hardest "Corporate Gifting" problem were punished with elimination. Because of this, this round brought slightly different qualities into the limelight: testing one's solutions very carefully, the ability to read the statement without any omissions, and having a good proof for each and every mathematical solution. To put it another way, it was all about patience :)

Finally, Codeforces Round 286 ended the week late on Sunday (problems, results, top 5 on the left, my screencast). After reading all problems, I was unable to come up with an algorithm to any of the five problems, which is quite unusual for Codeforces since problems A and B are usually not that difficult. Given that all problems required thinking, I've decided that I might as well go after the one that was scored the highest, and that turned out to be too optimistic. The problem simply asked to count the number of palindromes of the given length (up to 109) containing the given string (of length at most 200) as a subsequence (not necessarily as a substring - i.e. the characters don't have to be consecutive), modulo 10007.

You can see my slow progress towards a solution in the screencast: at about 37 minutes mark of the screencast, I've implemented a slow solution and automatic period detection, hoping that the answers for consecutive lengths would form a sequence with a short period - but they would not. Then, at about 50 minutes into the screencast, I came up with the idea that the slow solution can actually be improved by considering segments of palindrome before the next match of the subsequence at once, and thus separating the result into a large sum of products of powers of 24, 25 and 26 (more details probably in a later post), but after implementing this part around 1 hour 1 minute mark I've realized that I still have no idea how to quickly compute the sum I reduced the problem to. Around 1 hour 12 minute mark I've realized that this sum can be computed by taking 200 relatively small matrices to a large power modulo 10007, something we can do relatively quickly. I've finished the implementation and submitted around 1 hour 22 minute mark - only to learn that the solution is still too slow. Some more thinking led me to the idea of computing all 200 numbers I need using just one matrix power, but the matrix unfortunately became bigger, and thus the solution was still about 50% slower than needed. 1.5x was already in the 'non-asymptotic' speedup area, so I've switched to speeding up matrix multiplication, first by using the modulo operation less often, and then by taking advantage of the fact that the matrix had corners of zeroes that didn't need to be recomputed. That has finally enabled me to make a successful submission at 1 hour and 49 minutes into the contest, with too little time left to solve any other problem.

In the meantime, Ilya has solved the remaining four problems and claimed the victory - congratulations!

And finally, let me finish this post with something different: a chess puzzle! More specifically, I want to share a wonderful gem of a chess website, lichess, that provides a training area with puzzles like this one: http://en.lichess.org/training/23280.

Their idea is that since many people are playing chess against each other on their website, they can simply find "interesting" situations in those games automatically - probably when computer could see a unique sequence of good moves in a position that's losing otherwise - and present those situations as puzzles that make people learn real-world chess tactics, not just apply abstract backtracking skills. Quite fittingly, all puzzles are asking you to find the best move for Black/White, not to find a mate in three or something like that, since the former is what actually matters in real chess games. In particular, the puzzle referenced above was taken from this blitz game, where the real black player could not find the right sequence of moves and lost.

The puzzles also have the usual up/down voting buttons, so as hundreds and thousands of people are going through them, the system can become even more intelligent and suggest only quite interesting puzzles to most. The puzzle linked above and shown on the left was presented to me recently, and I've enjoyed solving it - did you?

Also, I can't help but wonder if we can come up with similar training approach for algorthmic programming based on the enourmous base of contest solutions that we've accumulated over the years.

Thanks for reading, and check back next week!

Wednesday, January 14, 2015

This week in competitive programming

Qualification Round of Facebook Hacker Cup 2015 took place last weekend. There was no direct competition, though, since it lasted for 72 hours and featured three relatively easy problems, and you needed to solve just one to advance to the next round, so people were not competing against one another. It still has a scoreboard (available only after logging in to Facebook), and you can see the top 5 who rushed to solve the problems as soon as they were available on the left, out of 940 contestants who solved all three.

Let's return to problem "Nanobugs" mentioned in the last week's post: you have n (at least 20 and at most 50) coins, some of which are fake. Your friend knows that either a or a+1 coins are fake (a is between 1 and 3, and is given). You know the fake coins exactly, but you don't want to reveal that knowledge to your friend. Instead you want to prove your knowledge to your friend by proving that there are exactly x (either a or a+1) fake coins without revealing the status of any particular coin - neither fake nor real. The only method you can use for your proof is a very limited subset equality testing device: it takes two non-interesecting subsets of coins of the same size, and tells whether both subsets have the same amount of fake coins or not.

I will describe the idea of the solution invented by my teammate Andrey Halyavin. First case is when just 1 coin is fake. In this case the friend knows that there are 1 or 2 fake coins. If the total amount of coins is even, then we can start by comparing one half of all coins with another half, with the result being of course that the two halves are not equal. What does your friend learn from this? One of the halves has all real coins, and another half has either 1 or 2 fakes, but he doesn't know which half, so we haven't yet revealed the status of any particular coin. Now let's continue similar comparisons, i-th comparison putting coins i, i+1, ..., i+n/2 on one side and all remaining coins on the other side. All of them will of course return "not equal" since exactly one coin is fake. Your friend still doesn't learn anything about the status of any particular coin, but the theory that there are 2 fake coins falls apart - since in that case at least one comparison would put them on different sides, and we'd get an "equal" result. So we've successfully convinced the friend that there's just 1 fake coin without revealing anything about any particular coin.

What to do if there's just one fake coin but the total number of coins is odd? It's not hard to see that the task is impossible. Every comparison has to leave at least coin aside. If that comparison returns an "not equal" result, then the coins left aside must be real, since there's just one fake coin, so we will have to reveal their status if we are to prove that there's exactly one fake coin. An if that comparison returns an "equal" result, then the coins participating in the comparison must be real.

Now let's describe a solution for the case where there are either 2 or 4 fake coins (the alternative that we want to disprove is 1 or 3 fake coins). If the total number of coins is even, then the solution is very simple: let's just split all coins into two equal sides, and compare them equal. This obviously means that the total number of fake coins is even, but we haven't revealed anything about the status of any particular coin, so we're done.

Handling the case where the total number of coins is odd is slightly more complex. Let's describe another solution for the case where the total number of coins is divisible by 5 and at least 10. We'll assemble a structure consisting of two "flowers" (pictured on the left): one (x, y) flower and one (y, x) flower. A (x, y) flower contains a center containing x coins, and four petals each containing y coins, for the total of x+4y coins. Similarly, a (y, x) flower contains y+4x coins, so we have 5x+5y coins in total. x and y can be arbitrary positive integers.

Now we'll do 4 comparisons, each comparison will put x+y coins on each side. The left side will be the center of the first flower and one of its petals, and the right side will be the center of the second flower and one of its petals. The resulf of each comparison will be "equal". First, it's not hard to see that we learn that the centers of the flowers contain the same number of fake coins: if one center had more fake coins, then the other flower must have compensated for that in each petal, and we'd get at least 4+1=5 fake coins, and we know we have at most 4. Having seen that, it's not hard to see that the corresponding petals of the flowers also have the same amount of fake coins, since they compare equal modulo the centers which are also equal. But that means that the total number of fake coins is even since we have found a match for every group with the same number of fake coins - and that mean we've proven what we needed to our friend. Note that we have not revealed the status of any particular coin: we have many ways to pick an arbitrary even number of fake coins in the centers and petals.

This solution only works if the total number of coins is divisible by 5, but we also have a solution for the case where the number of coins is divisible by 2, and both of those solutions simply prove that the total number of fake coins is even. So we can just put two such solutions side by side and achieve a solution for any n=5p+2q where p and q are at least 3, and it's not hard to see that any odd number above 20 can be represented in this manner (for example as 15 plus an even number). So we've finally solved the 2 and 4 cases completely.

The only remaining case is 3 fake coins. I will not describe the solution in detail here, but it's conceptually similar to the 2 or 4 case, but instead of proving that the total number of fake coins is even we will prove that it divides evenly by 3, using three flowers with three petals each instead of two flowers with four petals each - two (x, y) flowers and one (y, x) flower.

Thanks for reading, and check back next week! Also please share your impressions about Andrey's solution or the alternative solutions you see in comments.