Friday, October 15, 2010

TCO 2010 Finals Hard

With no better idea of spending some time on the Las Vegas - New York plane, I've decided to try coding the hard problem from the TCO 2010 finals (statement linked). This took me 26 minutes and I got 618 points in the practice room — knowing the solution in advance and using IDEA instead of a simple text editor (rng_58's time in the finals was 48 minutes, for 463 points).

I hope the code with the below explanations can serve as a semi-editorial while we're waiting for the official editorial.

What is TopCoder Mod Dash?

At this year's TopCoder Open, I've naturally enjoyed watching the algorithm and marathon competitions. However, there was one more which I enjoyed and found very watchable and intense: the Mod Dash competition. So here's a small educational video about what this competition is about:



There's also another video that I've made at the TCO, which is the usual overview of the arena. It is a bit boring, but just in case, here it goes:

TopCoder Open 2010 Algorithm Final Round

Here will go the comments for Algorithm Final Round of TopCoder Open 2010, the pinnacle of the event.

13:16 - 15 minutes before the start. Today's lineup: ACRush, liympanda, Louty from China, dzhulgakov and Klinck from Ukraine, PaulJefferys from UK, bmerry from South Africa and rng_58 from Japan.

13:18 - ACRush, liympanda, bmerry and PaulJefferys are returning participants of the TCO onsites, while Klinck, dzhulgakov, Louty and rng_58 are here for the first time. Klinck has been participating in TopCoder for 7 years, though, and has almost always been near the top of "Submission accuracy" rating. The effort has finally paid off!

13:23 - watching the screens of dzhulgakov and Klinck today.

13:25 - the introduction has started.

13:29 - 1 minute before the start.

13:34 - the easy problem: you have up to 50 election districts, and the voting happens the following way: in i-th district, there are voters[i] voters and each of them votes for somebody. Then, for each district the candidate who's got the largest number of voters wins. In case of a tie, nobody wins. You are asked about the smallest number X such that when X people vote for your party's candidates, no matter which X people, you're still going to have strictly more than half of all winners.

13:39 - the solution is quite easy and suggested by the examples. In the worst case, in order NOT to win you can get all votes in one half of the districts (the largest ones), and one less the half votes in all other districts. This requires care about ties, and different handling for odd/even number of total districts. Most people will submit something like this, but I expect an eventful challenge phase here.

13:40 - submissions from PaulJefferys, rng_58, dzhulgakov and ACRush. Only rng_58 have opened the medium.

13:41 - syg96 suggests a solution that will probably have less corner cases: iterate over how many district wins, draws and losses are we going to have (in the worst case of not winning). It's quite obvious that wins should go in the largest districts, then draws, then losses. Then just take the maximum over all scenarios where we don't win plus one.

13:42 - the medium problem: what is the smallest area of a square with vertices in lattice points that contains exactly n lattice points inside? n is up to 10^12.

13:44 - a submit on easy from Klinck.

13:44 - suppose the side vector of the square is (x, y). Then the number of lattice points inside is x^2+y^2-2*gcd(x,y)+1, according to the Pick's formula.

13:48 - this should be equal to n. Since we need to minimize the area, we need to minimize gcd(x,y).

13:50 - a lot of numbers are representable as sum of two squares. More precisely, all prime factors of form 4k+3 should have even exponent in the prime factorization.

13:51 - meanwhile, PaulJefferys have resubmitted the easy and liympanda has submitted it.

13:56 - so I would guess the following should work: we iterate over possible value for g=gcd(x,y). When that value is fixed, we get x^2+y^2=n+2*g-1. We substitute x=g*p, y=g*q, then we get p^2+q^2=(n+2*g-1)/(g*g). So we need to check if there's a sum of squares of relatively prime numbers equal to the given number. And for that, we can just do bruteforce since we'll need to check sqrt(n) values for g=1, sqrt(n)/2 values for g=2, and so on, for the total of sqrt(n)*log(n) values. This should run fast enough.

13:56 - dzhulgakov seems to have a working solution in terms of time limit, but it doesn't pass the examples.

13:57 - meanwhile Louty submits the easy, and everyone is working on the medium, including bmerry who didn't submit the easy.

13:58 - dzhulgakov fixes his bug, but still unsure about TL. ACRush has already submitted some time ago.

14:00 - rng_58 submits the medium, too.

14:01 - dzhulgakov speeds up his code from about 1s to about 0.2s, but seems to give WA now. He's looking into STL docs now.

14:03 - rng_58 have opened the hard, dzhulgakov has found his bug (int instead of long long). Now his sol works in 0.2s and seems correct. Going to submit?

14:04 - dzhulgakov submits.

14:06 - the hard problem: you are given a 44x44 rectangle of 'R', 'G' and 'B' characters. You consider all possible ways to take two (possibly intersecting or partially lying outside) kxk squares, then take all of the jewels inside at least one of those squares, then discard some of them and place all the rest in one chain. How many different chains can you get, modulo 10^9+9, if two chains that are reverses from each other are considered equal?

14:11 - the obvious idea for the first step is considering all pairs of two squares, and recording how many Rs, Gs and Bs are there in their union. This can be checked in O(1) for each pair if we precalculate partial sums of upper-left rectangles.

14:11 - meanwhile, liympanda and PaulJefferys also submit the medium.

14:13 - ACRush still didn't open the hard, he's checking his solution for the medium. People watching his screen say he didn't find any bug, he's just checking it very thoroughly. Maybe he knows about a problem but is not sure if it can be triggered by the possible testcases?

14:15 - then what we can do is iterate over possible values of how many Rs and Gs are there going to be in the chain. For each such pair, we calculate the largest amount of Bs possible, and now we need to calculate the number of chains with this many Rs and Gs and at most this number of Bs in O(1).

14:17 - if we forget about "up to reverse" condition, the formula is quite straightforward: ((R+G) choose R) times (1+((R+G+1) choose 1) + ((R+G+2) choose 2)+...+(R+G+B) choose B). How do we find that in O(1)?

14:19 - I guess we can just precalculate that for all values of R+G (which are up to 2*44*44 in total) and B (up to 44*44), that should be fast enough. So the only thing to deal with is "up to reverse" condition.

14:24 - And this is dealt with as usually by finding how many stay the same up to reverse (this means looking at a sequence of length n/2, with probably special casing for the middle cell when n is even), and then dividing all the remaining amount by 2 since all others split into pairs. answer_up_to_reverse=(answer+answer_thats_same_as_reverse)/2. This can also be viewed as the Burnside's lemma application.

14:24 - meanwhile, ACRush has resubmitted the easy which drops him to 5th in the current standings. Everyone has submitted the medium, bmerry still didn't submit the easy. rng_58 is first, dzhulgakov is second.

14:29 - now that we seem to have solutions for all problems, let's just watch what the contestants do.

14:31 - ACRush, rng_58 and bmerry seem to be coding something similar to the above solution. Most others are thinking. dzhulgakov is rechecking his easy solution.

14:33 - or maybe he's not? he has the easy code open, but maybe he's thinking about the hard on paper?

14:37 - some discussion of the easy between the spectators suggest it's quite hard to do it by considering all cases - the solution suggested by syg96 above (and dzhulgakov has implemented something very similar) is indeed the way to go.

14:41 - rng_58 is already testing his hard, not sure if it's the version without "up to reverse" or the final one.

14:42 - rng_58 has a bug in counting the number of Rs, Gs and Bs: x and y swapped in one place.

14:43 - liympanda resubmits the easy as well! This is going to be a hell of a challenge phase.

14:45 - everyone except dzhulgakov and liympanda is working on the hard, but it seems like not many are close enough to submit in 10 minutes.

14:47 - Louty gives up. bmerry, ACRush and rng_58 are all debugging one different example cases. Paul and Klinck are still coding.

14:48 - Klinck gives up as well, tests his medium. Maybe giving up now will actually prove crucial for the challenge phase?

14:49 - or maybe not! rng_58 passes all example cases, tests for TL...

14:51 - and he submits!

14:52 - and ACRush submits, too. He's above but the difference is less than 5 points. I guess challenges WILL be important, just not for everybody :)

14:54 - ACRush still looking at his hard - this is probably a bad idea. Should be working on challenge cases instead. Nobody else submits, the coding is over.

14:56 - PaulJefferys is preparing a large random testcase for ACRush and rng_58's hards. We'll see if those seem to be correct at the very beginning at the challenge phase. That will probably affect the further challenging strategy of everyone.

15:00 - most people went for 250s, bmerry is reading 500s. Paul decided not to blind-challenge the hards.

15:01 - ACRush successfully challenges Louty's solution!

15:02 - and there goes dzhulgakov's easy.

15:04 - the challenge case for both is "10, 10, 5, 4, 1" with the correct answer of 25. The trick here is that one needs to win the first two and draw the FOURTH in the worst case, while dzhulgakov only considered winning some largest ones and drawing some next largest ones.

15:08 - just one minute left, still no challenges on the hards. C'mon, Paul!

15:10 - Paul looked at the hards but decided not to challenge. ACRush got -25 in the final seconds.

15:10 - let's wait for the systests...

15:16 - ACRush seems to be pretty confident in his solutions. His only concern is the doubles he uses in the 500.

15:17 - ACRush has also told me that he didn't test his 1000 at all except the examples, but he doesn't think there'll be any TL issues.

15:18 - I think I will take a break now before the results are announced.

16:00 - rng_58 is the champion! ACRush's hard, Paul's easy and liympanda's medium failed, all others passed.

16:01 - wata is the Marathon champion! chokudai is second, doudouille is third. Go Japan!

16:03 - as expected, Margarita is the Mod Dash champion!

16:05 - and djackmania is the Studio champion! I think that's it for today's live coverage. However, please check this blog later for some videos of the event.

Thursday, October 14, 2010

TopCoder Open 2010 Algorithm Wildcard Round

Here will go the comments for Algorithm Wildcard Round of TopCoder Open 2010.

18:22 - it's very quiet in the arena, since most people who are not competing went to enjoy the fabulous Las Vegas (or just went to their rooms to get some sleep?). Most of algorithm competitors are here, though, even the ones who're not competing.

18:27 - I'm sitting in front of dzhulgakov's and syg96's screens. Hope that those two won't disappoint us, as rng_58 did in the semifinal.

18:30 - go!

18:30 - the easy problem: you start with N distinct numbers, and then can swap two adjacent numbers M times. How many different sequences can you get?

18:31 - N and M are up to 2000.

18:32 - so if a sequence X has Y inversions (pairs of indices such that the numbers at these indices are not in the same order in the original sequence), then we can get it using Y, Y+2, Y+4, ... swaps. The problem is thus reduced to counting sequences which have at most M inversions, and have the same amount of inversions modulo 2. I think this is done via quite simple DP - if you know the first number, you know how many inversions it creates, and then continue with the rest. This should work in O(2000^2).

18:35 - everybody is doing the easy problem, no variability from the contestants this time.

18:37 - PaulJefferys has submitted and then found out that his code TLs on (2000, 2000) :( Thanks Rustyoldman for the pointer!

18:40 - oh, AS1_PML30 has pointed out that my above solutions works in O(2000^3).

18:40 - dzhulgakov submits! And the DP is not so difficult to code in O(2000^2) by noticing that we can use accumulated sums to process each state in O(1).

18:42 - the medium problem: you're given at most 50 cards of form "+2", "*3". You shuffle then randomly, and then lay them down next to "0" to get an expression like "0+1-2*3", which is equal to -5. What is the expected value of this expression?

18:44 - first, we can notice that the order of "+" and "-" cards is not important. We can imagine the problem as starting with one "0" card and several "+X" and "-Y" cards, and then we need to place all "*Z" cards next to one of the starting cards, and the order of doing so for each starting card matters.

18:47 - maybe one can just do DP on the number of remaining cards of each type ("*1", "*2", ..., "*9", "*0")? That should be at most 5^10 states, which is about 10 million, but the transition will be quite slow.

18:54 - or maybe we can rely on the fact that we only need to find the expected value somehow? People seem to be at a loss around here.

18:54 - it turns out it's just me who's at a loss. Let rng_58 explain.

19:00 - still discussing the solution...

19:09 - so we need to calculate the expected value for each additive term, then sum those up. That means we should calculate just one number that depends on the multipliers: the expected multiplier for each additive term. At the simplest, we could iterate over all possible at most 5^10 compositions of a multiplier and average those with appropriate weights (some factorials multiplied and divided :)).

19:11 - bmerry suggests we can do it even faster: let's add multipliers one by one, and keep track of the expected value of the product if we have exactly k multipliers for each k, and the number of ways to get exactly k multipliers. Then this will be just something like O(50^2).

19:13 - another spectator reminder from Rustyoldman: everybody had submitted the easy and everyone is doing the medium... Not anymore! grotmol has opened the hard!

19:15 - here's the hard problem: how many ways are there to select some cells in a H times W rectangle such that no cells in one row are closer than K from each other?

19:16 - and the juicy part is that H, W and K are up to 1 billion!

19:19 - one simple part of solution by Onufry: we can forget about H and just take the answer to the power of H and subtract 1 in the end.

19:21 - another idea is that we need to calculate the number modulo 100003 which is a relatively small prime number.

19:25 - people are thinking about the hard problem. Most contestants are working on the medium, still no submissions. Two have opened the hard.

19:26 - dzulgakov submits the medium! I'm not sure what exactly his solution does, but it separates the additive and multipliers, and does some factorials afterwards :) He passed all examples.

19:32 - four people have submitted the medium now. The communal wisdom suggests that if we fix the number r of rooks in one row, then the number of ways is (W-K*(r-1)) choose r. Where do we go from here?

19:34 - and that's where the small prime number calls into play! In order to calculate a choose b modulo p, we can write both a and b in p-ary system, calculate digit-by-digit "choose" numbers, and multiply them. And to calculate those, we can pre-calculate all factorials modulo p and their inverses.

19:39 - so if the number r is relatively small, say, up to 10 million, we can do this. And when it's more than that, it means K is not more than 100 and thus we can do the usual matrix multiplication to find the number of ways (using recurrence f(n)=f(n-1)+f(n-K)).

19:41 - this solution was brought to you by ploh, Onufry, bmerry and jdmetz :)

19:44 - bmerry suggests we can avoid the matrix multiplication by grouping the r's which are the same modulo p and applying the above theorem about the number of combinations to each of those sums.

19:47 - nobody is writing anything that resembles a solution for the hard. Paul has calculated inverses modulo p, gmark has found the answers for small parameters and is staring at them. Unless we're missing a really simple solution, no hard will be solved in this round.

19:50 - izulin and grotmol are still working on their mediums, all others either do nothing or recheck their solutions.

19:52 - I believe there were no submissions in the past 30 minutes. dzhulgakov, Vasyl, syg96 and PaulJefferys have two problems (in this order), tomek, izulin, gmark and grotmol have one problem (in this order). Each of the groups of four is separated by less than 50 points, so any challenge in the first group brings the challenger to the top (and probably to the finals :)).

19:55 - The coding is over. I wonder what is the best strategy for the challenge phase. The medium has expected values so I bet it's pretty hard to fail the systests if you pass examples. Maybe when all numbers are multipliers, or when there are no multipliers at all? For the easy I can only imagine the tricky case when M is more than N*(N-1)/2.

20:01 - tomek tried the maxtest against gmark.

20:03 - the challenge phase is pretty calm, most people take a long time to read one solution.

20:05 - but just one challenge from syg96 or PaulJefferys can change who goes to the finals!

20:10 - a desperate last second challenge by gmark, and that's it. Waiting for the systests!

20:18 - most people agree that probably no solution will fail. More interestingly, Ivan Metelsky has confirmed that our solution for the hard is the expected one!

20:23 - Vasyl's and syg96's mediums have failed. dzhulgakov and Paul go to the finals! Stay tuned tomorrow for the coverage of the finals.

20:27 - here's the reason for the failure of the mediums: if you add up all additive terms in advance, you pass. But if you first multiply them and then add up many numbers, and when the expected answer is ZERO, but the multipliers are big - you're doomed, since small error relative to the product of the multiplier becomes big absolute error. That's a cool bug to fail to :)

Wednesday, October 13, 2010

TCO 2010 Algorithm Semifinal 2

Here will go the comments for Algorithm Semifinal 2 of TopCoder Open 2010.

12:56 - announcement complete, 3 minutes before the start. Will be looking at rng_58's screen during the first minutes.

13:00 - Into the fray!

13:01 - the easy problem: given several points on the plane, what is the maximal possible f(X) over strings X of the given length L consisting of "LRUD" characters. f(X) is defined as the probability of a robot starting at (0,0) will end at one of the given points if it performs each command with probability 1/2.

13:02 - L is up to 50, the number of locations is also up to 50. Thinking...

13:07 - rng_58 has started coding. It seems that the solution is quite straightforward: iterate over the number of "L", "R", "U" moves - the number of "D" moves is the rest. Then we need add up the probabilities of hitting each cell, which can be calculated using pre-calculated arrays of "what is the probability to get number x if we sum a (-1)'s and b (1)'s, each with probability 1/2". We need O(50^3) for calculating these arrays, and then O(50^3) possible strings will take just O(50) time each, which should be fast enough.

13:09 - rng_58 seems to have coded exactly this, probably has an off-by-one error somewhere.

13:11 - found the bug, submitted! ACRush and PaulJefferys have submitted as well. rng_58 is testing the maxtest and verifying the code, not moving on to the next problem yet.

13:12 - here goes another problem (not sure if it's medium or hard): given an even amount (up to 50) of boxes with given amount of gold and cost for each, you need to split them into two halves such that each half has the same (n/2) amount of boxes, and the sum of average costs per gram for the two halves is the minimum possible.

13:14 - the amounts and costs are up to 500.

13:27 - Suppose we fix the total weight of all boxes in one half. Then we sum two fractions where the sum of numerators is constant, and the denominators are given. It's obvious that to minimize this sum, we need to minimize the numerator of the fraction with the lower denominator. That leads us to the following DP: out of first a boxes, we took b boxes with total weight w - what is the lowest total cost to achieve that (we can get the highest by looking at the complement)? This should have running time of 50*50*50*500 which seems too big. rng_58 seems to have coded and submitted this. He hasn't tested it on maxtest yet.

13:28 - he's ran the maxtest, and the solution is very fast. Why?

13:29 - probably since 50 should actually be 25 in the above formula.

13:29 - rng_58 still doesn't feel confident enough to go to the hard. Let's take a look at the hard ourselves.

13:30 - the problem statement of the hard is amazingly simple. Given three arrays A, B and C of at most 34 elements each, find the number of ways to choose at least k indices such that if we sum the values of A, B and C for the chosen indices, the A's sum is the largest.

13:32 - as dzhulgakov correctly suggests, 34 suggests this will be meet-in-the-middle :)

13:33 - rng_58 is still thinking about this, and so are we.

13:35 - there's a cool constraint that k isn't more than 7, so all sufficiently large sets are allowed. Inclusion-exclusion? We're at a crossroads :)

13:36 - the numbers are up to a billion, so no help here. Let's forget about k first and solve the problem about the total number of ways.

13:43 - together with Dmytro, we seem to have solved the one with no k: if we build a vector (A[i]-B[i], A[i]-C[i]) for each i, we need to calculate the number of sets of vectors such that their sum is in the 1st quadrant (has both coordinates positive). Suppose we split all vectors in two halves, and calculated all possible sums for each half. Then we iterate over the sums of the first half in the order of increasing x-coordinate, and build a range-sum tree by y-coordinate of the vectors from the second half that add up to a positive x-coordinate with the ones we've chosen from the first set. Then we need one range-sum query to find the number of elements in the second half that will pair with the given element from the first half.

13:50 - and now let's remember about k. Since k is small, we can instead find the number of sets with less than k elements, and subtract it from the overall count. And in order to do that, we can modify the above meet-in-the-middle algorithm so that splits the sums for each half into buckets with the given amount of indices, and... Wait! Since 34*33*32*31*30*29/1*2*3*4*5*6=1344904, we can just iterate over all subsets of at most k-1 elements and verify the condition for them. So this part of the problem is very easy :)

13:53 - now that we're done with the problems, let's take a walk to other people's monitors and discuss the submitted solutions.

13:54 - ACRush was struggling with TL in his hard solution, and now he's getting WA on one of the examples.

13:55 - People have pointed out that the range-sum tree is tricky here since the values are pretty huge, so you need to "compress" the indices and be very careful.

13:56 - there's a Studio competitors announcement during the round over the loudspeakers. That should be extremely annoying to the contestants.

13:57 - ACRush has fixed the bug and submitted! (or maybe it was not a bug but just some debugging run?)

13:59 - RAD seems to be getting memory limit in the medium. No wonder, since it seems to be a requirement to do the DP remembering only 2 outermost rows instead of all of them :)

14:00 - liympanda is also debugging his hard problem, but he has a binary search inside his solution for some reason. Maybe it's just for compressing the coordinates?

14:01 - no, he's using it to calculate the answer. That looks strange.

14:03 - griffon is still struggling with the easy. That strategy doesn't work, as I've rigorously proven in the first round :)

14:04 - PaulJefferys seems to be generating all sums, so that should be the part of meet-in-the-middle. He still has a long way to go.

14:05 - ACRush seems to be debugging his hard on a testcase of his own. I'm not sure if he's just testing it or if he's found some issue.

14:08 - Onufry submits the hard! His solution has very funny 6 nested loops in the end to iterate over all combinations to take the k parameter into account.

14:10 - going back to rng_58 - he seems to have got the solution ready, was testing it for some time, and finally submits!

14:12 - rng_58 prepares a large testcase and tests his solution...

14:13 - works in 0.229s!

14:14 - tomekkulczynski seems to have the correct idea in the hard as well, and he's also splitting the sums into buckets as we've suggested above before we saw the easier approach for the k part.

14:15 - 9 minutes to go in the round. ACRush and rng_58 have three, tomek, Paul and gmark have the first two, Onufry had only the hard, griffon has nothing, all others have only the easy.

14:17 - wata is doing the hard, and seems to work on some kind of meet-in-the-middle. However, the past 24-hour marathon seems to have taken its toll and he's writing code quite slowly. Somehow ACRush doesn't seem affected :)

14:19 - it's a bit sad that we can't view the competitors' code now, as we could endlessly speculate about possible bugs. Well, we'll have to watch them during the challenge phase and do our best!

14:21 - meanwhile, liympanda submits the hard.

14:22 - PaulJefferys is reading up some STL docs. I'm guessing he ran into problems with the range-sum tree. He seems to have given up.

14:24 - most people seem to be debugging frantically in the last minute. Onufry still can't get his easy to pass the examples. izulin submits the medium. A good challenge target?

14:27 - the problems all seem to be more difficult on the time limit side and not on the correctness side; so I'd expect all challenges, if any, happen because of time limits.

14:31 - izulin's medium is downed by RAD. ACRush has looked at it before that but decided not to challenge.

14:32 - rng_58 seems to be reading the chat history to verify if izulin's easy has already been challenged (I'd guess for time limit?). Hey, you can use right click --> history for that!

14:33 - another good shot by RAD! Onufry goes down. Let's look at RAD's screen :)

14:34 - a point for RAD - he first checks the history to make sure the problem hasn't already been challenged, as in that cases chances are probably much lower. That's what I should be doing!

14:35 - unsuccessful challenge on liympanda. RAD's challenge for the hard seems to be just large and random. So there's high chance liympanda will pass the systest.

14:36 - judging by RAD's successes, we'll see quite a few solutions failing systests as well.

14:37 - ACRush is looking at tomek's 500 very carefully.

14:39 - another two unsuccessful challenges for RAD. That probably makes sense since he's (almost?) the lowest scorer on the easy.

14:39 - tomek brings Paul's easy down! Yes, I expect the systests to be quite eventful.

14:44 - Paul says his easy has failed because the coordinates in the input can be up to 100, even though the length of the string is at most 50, so he only had a small array and probably died because of strange out-of-bounds access results.

14:46 - RAD told that his goal was at least +125, so he just challenged almost blindly. Worked out for +100, but not anymore :(

14:49 - everybody waiting for the results. It seems that ACRush and rng_58 are pretty solid, and will get into the finals even with one problem failing.

14:50 - actually that's not true. rng_58 can be fourth if his hard fails.

14:51 - so the systests only brought w_'s easy down. The finalists are ACRush, rng_58 and liympanda. tomekkulczynski, gmark, PaulJefferys and izulin go to the wildcard.

15:00 - Ivan Metelsky the Algorithm Coordinator told that the results for the second semifinal are as expected, while they wanted much more people to solve the medium in the first semifinal. Stupid me for not opening it :(

15:02 - speaking about the hard problem of the first semifinal - people have rightly pointed out that the inner DP actually runs in O(25*25), for the total running time of 1000*25^3, which is fast enough. The author of the problem can solve the problem in 1000*25^2 - so here's some homework for you :)

15:03 - that's about it for the semifinals! Stay tuned later today for the coverage of the Wildcard round that starts in 3 hours 30 minutes.

TCO 2010

The bad news (for me :)) is that I've crashed out of TopCoder Open 2010. Somehow got stuck in the easy problem for most of the contest, then somehow decided to open the hard instead of the medium and didn't manage to solve it in time.

Here's the solution for the hard that I had in mind at the end of the contest:
  • First, iterate over all possible cutoff scores between 0 and 1000, and over lastCutoff (last person to score exactly cutoff points), between 1 and 25.
  • For each such pair, calculate two arrays using DP:
    • What is the probability that out of first x persons, exactly k will advance (that means, score above cutoff, or equal to cutoff if their number is less than or equal to lastCutoff).
    • The same for last x persons.
  • In calculating those arrays, we make use of the fact that all scores are independent, so the DP is quite straightforward; we need to remember that we've already fixed the score of person lastCutoff to cutoff (but this still has probability of 1/n, not 1).
  • After we've got those arrays, let's calculate the probability of each person a advancing, and being b-th highest ranked person in the finals. This is quite easy now: we need to multiply the probabilities of:
    • b-1 people to advance from the first a-1 persons
    • k-b-1 people to advance from the last totalPersons-a persons
    • a-th person advancing. Here we need to keep in mind that if a is equal to lastCutoff, this should be the fixed 1/n probability since we're only interested in the case where he scores cutoff points.
  • And finally we accumulate the above probabilities by semifinal number, to get the answer for the problem.
This has complexity of O(1000*25*(25*25+timeOfInnerDP)). The obvious implementation of the inner DP runs in O(25*25*25), processing each state in O(25) operations. 1000*25^4=400M, so this will probably already run in time (since the number of states is not 25*25 but 25*25/2, and so on), but I have the feeling that since the inner DPs are very similar, we can try to run all of them in such a way that each takes just O(25*25). Ideas?. The inner DP actually runs in O(25*25) since each state is processed in O(1), so the total runtime is good enough. I was so close during the round!

The good news is that I'll now have a lot of free time at the TCO and will do live commentary on all 3 remaining Algorithm rounds: Semifinal 2, Wildcard and Finals. I will post the commentary both here and on the TopCoder blog. Stay tuned :)