Sunday, February 22, 2015

This week in competitive programming

Codeforces Round 292 took place late on Tuesday (problems, results, top 5 on the left). Haghani has pulled off an impressive victory given that he was not the rating favorite - but of course he did climb much higher in the ratings as a result, now he's ranked 28th. Congratulations! rng_58, on the other hand, could've probably been the rating favorite if not for his recent strategy of solving the hardest problem first. Once again he became the only person to solve the said hardest problem, but couldn't solve two other problems in time and only placed 10th.

TopCoder SRM 650 started next morning less than 8 hours after the Codeforces round ended (problems, results, top 5 on the left), so big congratulations to Endagorion on winning it in addition to placing 3rd earlier, and for being the only one to solve the hard problem!

There were also four Open Cup rounds in January and February. The problems from those rounds are not published yet, but it's been a month now and it's boring to wait longer, so I will try to explain the problem statements in full below.

Open Cup 2014-15 Grand Prix of Japan took place three weeks ago (results, top 5 on the left). Problem I was, on the surface, a repeat of a problem from Petr Mitrichev Contest 9 back from 2011: it simply asked to count the number of pairs of strings (s, t) such that the length of s is N, the length of t is M, each character is from an alphabet of size A, and t is a substring of s. When this problem was given on my contest, N, M and A were up to 100, and you needed to output a floating-point number - the probability that a random pair has this property. No team could solve this problem during the contest - take a look at problem H in standings. The reference solution made heavy use of the fact that we need to output a fraction and are allowed 10-9 error, since for longer substrings the probability is almost zero, and for shorter substrings we can iterate over all possible prefix functions of the substring, and when the prefix function is fixed counting superstrings is a relatively straightforward dynamic programming on the Knuth-Morris-Pratt automaton.

Fast forward 3.5 years, and now the harder version is given on a contest: now N is up to 200, M is up to 50, A is up to 1000, and you need to find the answer modulo 109+7, so you can't squeeze by with approximation arguments anymore. And one team does solve this problem - congratulations to Alex, Slava and Artem!

The key to solving the harder version is to notice that we don't, in fact, need the entire prefix function of the substring to count the number of superstrings containing it: it turns out that we can just look at the set of prefixes of the substring that are also its suffixes, in other words at the set of its borders. Can you see why the borders are enough?

Thanks for reading, and check back next week for the full explanation of this problem's solution, and for a new hard problem!

Thursday, February 19, 2015

This week in competitive programming

Theorem (Generalized Cayley's formula?).

Given a graph with n labeled vertices, let s be the number of its connected components and m1,m2,...,ms be the numbers of vertices in those components. Then the number of ways to add s-1 edges to this graph so that it becomes connected is ns-2∏mi.

When Mikhail `Endagorion' Tikhomirov told me this theorem, I was very suprised that it looks so simple yet I've never encountered it before. Is it a well-known result? Does it have a name?

This theorem comes in handy when solving a problem from previous week's summary: how many simple undirected graphs are there with n labeleld vertices and k bridges?

Here's how we can solve this problem: in addition to the numbera an,k of graphs with n vertices and k bridges, we'll also count the number bn,k of connected graphs with n vertices and k bridges (including, as k=0 case, the number of biconnected graphs), the value cn,k which is the sum over all graphs with n vertices and k connected components, such that all its connected components are biconnected, of the products of sizes of the components, and finally the number dn of connected graphs with n vertices.

Let's start with bn,k with positive k. Since the graph has at least one bridge, the bridges split it into several biconnected components, and the Generalized Cayley's formula mentioned above helps count the ways to join those biconnected components with bridges into a connected graph, so we get bn,k=nk-1*cn,k+1. Now bn,0 is simply dn-sum(bn,k).

To find cn,k, we first notice that cn,1=n*bn,0. For larger k, we should consider how many vertices are there in the biconnected component the contains the first vertex - let's say it has m vertices, then we have to multiply the number of ways to pick the remaining m-1 vertex in the component by the numbe of ways to pick the remaining components. In other words, cn,k=sum(comb(n-1,m-1)*cm,1*cn-m,k-1).

Finding dn is a standard question that is solved in a similar way: connected graphs are all graphs minus disconnected graphs, and disconnected graphs can be categorized by the size of the connected component containing the first vertex. In other words, dn=2n*(n-1)/2-sum(comb(n-1,m-1)*dm*2(n-m)*(n-m-1)/2).

And finally, we can find the value that we need: an,k. Again, let's look at the connected component containing the first vertex, and suppose it has m vertices and p bridges. Then we get an,k=sum(comb(n-1,m-1)*bm,p*an-m,k-p).

We compute O(n2) values, spending O(n) for each value in b, c and d, but O(n2) for each value of a, so the total running time is O(n4).

Here's the code:
public class Fragile {
    static final long MODULO = (long) (1e9 + 7);

    public int countGraphs(int N, int K) {
        long[][] a = new long[N + 1][N + 1];
        long[][] b = new long[N + 1][N + 1];
        long[][] c = new long[N + 1][N + 1];
        long[] d = new long[N + 1];
        d[0] = 1;
        long[][] comb = new long[N + 1][N + 1];
        comb[0][0] = 1;
        a[0][0] = 1;
        long[] p2 = new long[N * (N - 1) / 2 + 1];
        p2[0] = 1;
        for (int i = 1; i < p2.length; ++i) {
            p2[i] = 2 * p2[i - 1] % MODULO;
        }

        for (int n = 1; n <= N; ++n) {
            comb[n][0] = 1;
            for (int k = 1; k <= N; ++k) {
                comb[n][k] = (comb[n - 1][k - 1] + comb[n - 1][k]) % MODULO;
            }
            d[n] = p2[n * (n - 1) / 2];
            for (int m = 1; m < n; ++m) {
                d[n] = (d[n] - comb[n - 1][m - 1] * d[m] % MODULO * p2[(n - m) * (n - m - 1) / 2] % MODULO + MODULO) % MODULO;
            }
            for (int k = 2; k <= n; ++k) {
                for (int m = 1; m < n; ++m) {
                    c[n][k] = (c[n][k] + comb[n - 1][m - 1] * c[m][1] % MODULO * c[n - m][k - 1]) % MODULO;
                }
            }
            b[n][0] = d[n];
            long pw = 1;
            for (int k = 1; k < n; ++k) {
                b[n][k] = pw * c[n][k + 1] % MODULO;
                pw = pw * n % MODULO;
                b[n][0] = (b[n][0] - b[n][k] + MODULO) % MODULO;
            }
            c[n][1] = n * b[n][0] % MODULO;
            for (int k = 0; k < n; ++k) {
                for (int m = 1; m <= n; ++m) {
                    for (int p = 0; p <= k; ++p) {
                        a[n][k] = (a[n][k] + comb[n - 1][m - 1] * b[m][p] % MODULO * a[n - m][k - p]) % MODULO;
                    }
                }
            }
        }

        return (int) a[N][K];
    }
}
Do you know a simpler way to solve this problem? Do you know a faster way to solve this problem?

My solution at the contest was even more complicated with O(n4) states and O(n6) running time, so it didn't complete in 2 seconds and I had to precompute the answers and submit those. I've used the following dynamic programming: how many graphs are there with n vertices, m vertices in the biconnected component containing the first vertex, k bridges, and p bridges going out from the biconnected component containing the first vertex.

Now, let's turn to last week's only regular online competition - TopCoder SRM 649 (problems, results, top 5 on the left). Fourteen people solved all three presented problems, but sdya has managed to create a comfortable 100+ point margin with fast solutions and a challenge - congratulations!

Thanks for reading, and check back next week!

Friday, February 13, 2015

This week in competitive programming

TopCoder SRM 648 has started the last week's contests as early as Monday (problems, results, top 5 on the left, my screencast). The hard problem, while innocent on the surface, turned out to be exceptionally tricky, requiring some serious combinatorics and dynamic programming mastery - check it out! The statement is as simple as: how many simple undirected graphs are there with N labeled vertices and K bridges? N is up to 50.

Codeforces Round 290 took place just several hours later (problems, results, top 5 on the left, my screencast). Problem D was another cute combinatorics exercise. Its most interesting part boiled down to: given a tree, how many ways are there to remove its leaves one by one until we remove everything? Of course, new leaves may appear after removing some. Had the tree been rooted, this would be a pretty standard dynamic programming application, but how to deal with the fact that there's no root?

Finally, Rockethon 2015 gave everybody a shot at prizes on Saturday (problems, results, top 5 on the left). Gennady has carefully progressed through all problems, not really leaving anybody else a chance - congratulations!

There were also several Open Cup rounds in the past weeks, and the Petrozavodsk training camp has assembled an unprecedented number of ACM ICPC teams - but as the same problems are still going to be used for other training camps, I will postpone discussing those for now. Stay tuned for some really tough problems in the coming weeks!

Wednesday, February 4, 2015

This week in competitive programming

Facebook Hacker Cup 2015 has completed its weekly routine with Round 3 on Saturday night (problems with Facebook login, results with Facebook login, top 5 on the left, my screencast). Top 25 will travel to the United States for the final round on March 6, and big congratulations to Gleb who claimed the first place in this really competitive round!

Let's also come back to the interesting problem I mentioned in last week's summary: you are given a directed graph where each arc has a cost, one of the vertices is marked as starting point, and one as ending point. You need to pick a subset of arcs with the smallest total cost such that every path from the starting point to the ending point (including paths that pass through the same vertex or arc more than once) passes through arcs from this subset exactly once.

Whenever we need to pick a subset of arcs that splits a graph in two, the theory of flows and cuts comes to mind. However, we're not simply asked to find a minimum cut on a given graph, and not even on its condensation, as illustrated by the following acyclic example:

In this graph, the minimum cut is of size 2 and separates vertices S and A from vertices B and T. However, the path S-B-A-T crosses the cut three times (twice going outside, and once going inside), and we need for each path to cross the cut exactly once. Basically, the minimum cut is solving the "at least once" version.

But the power of the flow/cut theory often lies in applying it to slightly modified graphs. For cuts in particular, adding infinite arcs to the graph is usually the solution of choice. Here our goal is to make sure one can't re-enter the cut after leaving it, so for each arc we'll add a reverse arc with infinite capacity, as illustrated by the following picture:

In particular, the newly added A-B arc of infinite capacity changes the incorrect cut displayed above from capacity 2 to infinite capacity (1+1+infinity), and thus the smallest cut is now of size 3, displayed by the dotted line to the left. There's also another cut of size 3 - just cut off the source vertex S. It's not hard to verify that every path from S to T croses this cut exactly once.

Another great thing about this addition of infinite arcs is that it allows us to skip the condensation step. If there's a cycle somewhere in our graph, we'll thus add a reverse infinite cycle, and thus no minimum cut will ever cross any cycle!

Such observations make this solution "click" when you see it: instead of fighting with the problem, everything flows naturally and the solution makes sense from all angles. I especially value such moments, I think they show the beauty of mathematics and algorithms.

However, everything was not that simple in this problem :) It turns out that in order for this solution to work correctly in all cases, we need to prune the graph before adding the infinite arcs. More specifically, we need to remove all vertices that don't lie on any path from S to T from the graph. Can you see where we went wrong, and why doesn't the cut theory save us in this case?

Thanks for reading, and check back next week!