Thursday, October 14, 2010

TopCoder Open 2010 Algorithm Wildcard Round

Here will go the comments for Algorithm Wildcard Round of TopCoder Open 2010.

18:22 - it's very quiet in the arena, since most people who are not competing went to enjoy the fabulous Las Vegas (or just went to their rooms to get some sleep?). Most of algorithm competitors are here, though, even the ones who're not competing.

18:27 - I'm sitting in front of dzhulgakov's and syg96's screens. Hope that those two won't disappoint us, as rng_58 did in the semifinal.

18:30 - go!

18:30 - the easy problem: you start with N distinct numbers, and then can swap two adjacent numbers M times. How many different sequences can you get?

18:31 - N and M are up to 2000.

18:32 - so if a sequence X has Y inversions (pairs of indices such that the numbers at these indices are not in the same order in the original sequence), then we can get it using Y, Y+2, Y+4, ... swaps. The problem is thus reduced to counting sequences which have at most M inversions, and have the same amount of inversions modulo 2. I think this is done via quite simple DP - if you know the first number, you know how many inversions it creates, and then continue with the rest. This should work in O(2000^2).

18:35 - everybody is doing the easy problem, no variability from the contestants this time.

18:37 - PaulJefferys has submitted and then found out that his code TLs on (2000, 2000) :( Thanks Rustyoldman for the pointer!

18:40 - oh, AS1_PML30 has pointed out that my above solutions works in O(2000^3).

18:40 - dzhulgakov submits! And the DP is not so difficult to code in O(2000^2) by noticing that we can use accumulated sums to process each state in O(1).

18:42 - the medium problem: you're given at most 50 cards of form "+2", "*3". You shuffle then randomly, and then lay them down next to "0" to get an expression like "0+1-2*3", which is equal to -5. What is the expected value of this expression?

18:44 - first, we can notice that the order of "+" and "-" cards is not important. We can imagine the problem as starting with one "0" card and several "+X" and "-Y" cards, and then we need to place all "*Z" cards next to one of the starting cards, and the order of doing so for each starting card matters.

18:47 - maybe one can just do DP on the number of remaining cards of each type ("*1", "*2", ..., "*9", "*0")? That should be at most 5^10 states, which is about 10 million, but the transition will be quite slow.

18:54 - or maybe we can rely on the fact that we only need to find the expected value somehow? People seem to be at a loss around here.

18:54 - it turns out it's just me who's at a loss. Let rng_58 explain.

19:00 - still discussing the solution...

19:09 - so we need to calculate the expected value for each additive term, then sum those up. That means we should calculate just one number that depends on the multipliers: the expected multiplier for each additive term. At the simplest, we could iterate over all possible at most 5^10 compositions of a multiplier and average those with appropriate weights (some factorials multiplied and divided :)).

19:11 - bmerry suggests we can do it even faster: let's add multipliers one by one, and keep track of the expected value of the product if we have exactly k multipliers for each k, and the number of ways to get exactly k multipliers. Then this will be just something like O(50^2).

19:13 - another spectator reminder from Rustyoldman: everybody had submitted the easy and everyone is doing the medium... Not anymore! grotmol has opened the hard!

19:15 - here's the hard problem: how many ways are there to select some cells in a H times W rectangle such that no cells in one row are closer than K from each other?

19:16 - and the juicy part is that H, W and K are up to 1 billion!

19:19 - one simple part of solution by Onufry: we can forget about H and just take the answer to the power of H and subtract 1 in the end.

19:21 - another idea is that we need to calculate the number modulo 100003 which is a relatively small prime number.

19:25 - people are thinking about the hard problem. Most contestants are working on the medium, still no submissions. Two have opened the hard.

19:26 - dzulgakov submits the medium! I'm not sure what exactly his solution does, but it separates the additive and multipliers, and does some factorials afterwards :) He passed all examples.

19:32 - four people have submitted the medium now. The communal wisdom suggests that if we fix the number r of rooks in one row, then the number of ways is (W-K*(r-1)) choose r. Where do we go from here?

19:34 - and that's where the small prime number calls into play! In order to calculate a choose b modulo p, we can write both a and b in p-ary system, calculate digit-by-digit "choose" numbers, and multiply them. And to calculate those, we can pre-calculate all factorials modulo p and their inverses.

19:39 - so if the number r is relatively small, say, up to 10 million, we can do this. And when it's more than that, it means K is not more than 100 and thus we can do the usual matrix multiplication to find the number of ways (using recurrence f(n)=f(n-1)+f(n-K)).

19:41 - this solution was brought to you by ploh, Onufry, bmerry and jdmetz :)

19:44 - bmerry suggests we can avoid the matrix multiplication by grouping the r's which are the same modulo p and applying the above theorem about the number of combinations to each of those sums.

19:47 - nobody is writing anything that resembles a solution for the hard. Paul has calculated inverses modulo p, gmark has found the answers for small parameters and is staring at them. Unless we're missing a really simple solution, no hard will be solved in this round.

19:50 - izulin and grotmol are still working on their mediums, all others either do nothing or recheck their solutions.

19:52 - I believe there were no submissions in the past 30 minutes. dzhulgakov, Vasyl, syg96 and PaulJefferys have two problems (in this order), tomek, izulin, gmark and grotmol have one problem (in this order). Each of the groups of four is separated by less than 50 points, so any challenge in the first group brings the challenger to the top (and probably to the finals :)).

19:55 - The coding is over. I wonder what is the best strategy for the challenge phase. The medium has expected values so I bet it's pretty hard to fail the systests if you pass examples. Maybe when all numbers are multipliers, or when there are no multipliers at all? For the easy I can only imagine the tricky case when M is more than N*(N-1)/2.

20:01 - tomek tried the maxtest against gmark.

20:03 - the challenge phase is pretty calm, most people take a long time to read one solution.

20:05 - but just one challenge from syg96 or PaulJefferys can change who goes to the finals!

20:10 - a desperate last second challenge by gmark, and that's it. Waiting for the systests!

20:18 - most people agree that probably no solution will fail. More interestingly, Ivan Metelsky has confirmed that our solution for the hard is the expected one!

20:23 - Vasyl's and syg96's mediums have failed. dzhulgakov and Paul go to the finals! Stay tuned tomorrow for the coverage of the finals.

20:27 - here's the reason for the failure of the mediums: if you add up all additive terms in advance, you pass. But if you first multiply them and then add up many numbers, and when the expected answer is ZERO, but the multipliers are big - you're doomed, since small error relative to the product of the multiplier becomes big absolute error. That's a cool bug to fail to :)


  1. 19:44 – bmerry suggests we can avoid the matrix multiplication by grouping the r‘s which are the same modulo p and applying the above theorem about the number of combinations to each of those sums.

    Is it true? Even though the r have the same modulo p. Lucas' theorem does not just use the modulo of the number but the p-ary representation, I am not sure if the p-ary representations will be the same for any two different rs that are equal modulo p.

  2. I'm sorry I wasn't clear enough here :)

    Suppose we're looking at the sum
    combinations[n-k*r][r] over all r of form r=p*x+c, where c is fixed. This gets us:

    sum(combinations[n-k*(p*x+c)][p*x+c])=sum(combinations[n-k*p*x-k*c][p*x+c])=(by Lucas' theorem) sum(combinations[(n-k*c) div p-k*x][x]*combinations[(n-k*c) mod p][c])=sum(combinations[(n-k*c) div p-k*x][x])*combinations[(n-k*c) mod p][c].

    The second multiplier doesn't depend on x, and has both parameters less than p. The first multiplier, a sum, almost doesn't depend on c - more precisely, we have at most k such sums (since k*c is between 0 and k*p, so this gets reduced to between 0 and k when divided by p).

    You can take a look at bmerry's solution in the practice room for an implementation.