Thursday, March 3, 2016

A solution week

The Feb 15 - Feb 21 week, somewhat surprisingly, didn't have any algorithmic contests among the ones that I usually cover. However, let's use this opportunity to dive into the solutions of the problems I mentioned before.

The first one was: you are given a set with 200000 numbers, and need to find the subset of that set where the difference between the mean and the median is the highest. The most tricky part of this problem is the median: unlike the mean, it's hard to update incrementally, so the natural idea is to start by iterating over which number will be the median (let's assume we'll pick an odd-sized subset for now). Having fixed the median, we need to add k numbers smaller than it and k numbers larger than it, maximizing the mean. Now it's easy to see that those numbers should be as large as possible: k larger numbers, plus k largest numbers right before out median in the sorted order. Now, if we iterate over all possible values of k, we obtain a O(n2) solution.

In order to speed up this solution, let's look more closely what happens when we increase k. We add two numbers to our set - the kth largest one, and the kth largest one among the ones before the median. Let's call those two numbers a and b. The mean before adding those numbers is c/d (where d is 2k+1 or something like that), and now it becomes (c+a+b)/(d+2). Is it smaller or larger than c/d? Well, the mediant inequality tells that this number lies between c/d and (a+b)/2. So if (a+b)/2 is larger than the current mean, the mean will increase, otherwise it will not. The final step is to notice that (a+b)/2 is always going down since we pick a smaller number for both a and b at each subsequent step. Because of this, we can see that the mean is a bitonic function of k: first increases, then decreases.

This allows us to use binary search on k, comparing c/d with (a+b)/2 inside to see if we should go to the left or to the right (of course, we could just compare two adjacent means, but that might overflow the 64-bit integer type), and obtaining a O(nlogn) solution for the problem.

The second one was: given a bipartite graph with 45 vertices, find how many subsets of its vertices are vertex-dominating. A subset of vertices is vertex-dominating if any vertex of the graph either lies on this subset, or has a neighbor in this subset, or both.

Given that we have 45 vertices, the logical first step is to pick the smaller of its two parts, which will contain at most 22 vertices. Now we need to find something that works in O(2n) or similar complexity.

We can now afford to iterate over which subset of this part will be included in the dominating set. Having fixed that, we still have to cover:

1. The vertices in the first part that are not included in the subset.
2. The vertices in the second part that are not adjacent to any vertex from the subset.
Part 2 is easy: we must just include all those vertices into our dominating set. Those vertices, in turn, will cover some vertices from part 1, but there might be some more left over. Now our state is: we have already included some subset A of the first part and some subset B of the second part in the dominating set, the entire second part is covered, and in the first part there's some uncovered subset C which we're still to cover using the some subset of the set D of the vertices of the second part, where D is the complement of B.

This could again be solved via dynamic programming in O(|D|*2|C|) time, yielding overall O(n*3n/2) complexity for the entire problem. However, for n=45 that is too slow.

Here comes the most magical moment of this solution: actually, the set D doesn't matter that much, so we can do this dynamic programming just once instead of separately for every A. One might say that of course D does matter, as we can otherwise pick a vertex from B when covering C, and overcount. However, from the construction above we can notice that vertices from B and C don't have any edges between them, and thus don't affect the cover - we can take an arbitrary subset of B in addition to some subset of D. Because of this, we'll actually count each solution exactly 2|B| times, and we can just divide the result of the inner dynamic programming by 2|B| to get the required count! Now we have a O(n*2n/2) solution.

Thanks for reading, and please cheer for me in the upcoming Hacker Cup onsite final round, which is scheduled to begin at 13:30 London time today!