Friday, October 15, 2010

TCO 2010 Finals Hard

With no better idea of spending some time on the Las Vegas - New York plane, I've decided to try coding the hard problem from the TCO 2010 finals (statement linked). This took me 26 minutes and I got 618 points in the practice room — knowing the solution in advance and using IDEA instead of a simple text editor (rng_58's time in the finals was 48 minutes, for 463 points).

I hope the code with the below explanations can serve as a semi-editorial while we're waiting for the official editorial.

What is TopCoder Mod Dash?

At this year's TopCoder Open, I've naturally enjoyed watching the algorithm and marathon competitions. However, there was one more which I enjoyed and found very watchable and intense: the Mod Dash competition. So here's a small educational video about what this competition is about:



There's also another video that I've made at the TCO, which is the usual overview of the arena. It is a bit boring, but just in case, here it goes:

TopCoder Open 2010 Algorithm Final Round

Here will go the comments for Algorithm Final Round of TopCoder Open 2010, the pinnacle of the event.

13:16 - 15 minutes before the start. Today's lineup: ACRush, liympanda, Louty from China, dzhulgakov and Klinck from Ukraine, PaulJefferys from UK, bmerry from South Africa and rng_58 from Japan.

13:18 - ACRush, liympanda, bmerry and PaulJefferys are returning participants of the TCO onsites, while Klinck, dzhulgakov, Louty and rng_58 are here for the first time. Klinck has been participating in TopCoder for 7 years, though, and has almost always been near the top of "Submission accuracy" rating. The effort has finally paid off!

13:23 - watching the screens of dzhulgakov and Klinck today.

13:25 - the introduction has started.

13:29 - 1 minute before the start.

13:34 - the easy problem: you have up to 50 election districts, and the voting happens the following way: in i-th district, there are voters[i] voters and each of them votes for somebody. Then, for each district the candidate who's got the largest number of voters wins. In case of a tie, nobody wins. You are asked about the smallest number X such that when X people vote for your party's candidates, no matter which X people, you're still going to have strictly more than half of all winners.

13:39 - the solution is quite easy and suggested by the examples. In the worst case, in order NOT to win you can get all votes in one half of the districts (the largest ones), and one less the half votes in all other districts. This requires care about ties, and different handling for odd/even number of total districts. Most people will submit something like this, but I expect an eventful challenge phase here.

13:40 - submissions from PaulJefferys, rng_58, dzhulgakov and ACRush. Only rng_58 have opened the medium.

13:41 - syg96 suggests a solution that will probably have less corner cases: iterate over how many district wins, draws and losses are we going to have (in the worst case of not winning). It's quite obvious that wins should go in the largest districts, then draws, then losses. Then just take the maximum over all scenarios where we don't win plus one.

13:42 - the medium problem: what is the smallest area of a square with vertices in lattice points that contains exactly n lattice points inside? n is up to 10^12.

13:44 - a submit on easy from Klinck.

13:44 - suppose the side vector of the square is (x, y). Then the number of lattice points inside is x^2+y^2-2*gcd(x,y)+1, according to the Pick's formula.

13:48 - this should be equal to n. Since we need to minimize the area, we need to minimize gcd(x,y).

13:50 - a lot of numbers are representable as sum of two squares. More precisely, all prime factors of form 4k+3 should have even exponent in the prime factorization.

13:51 - meanwhile, PaulJefferys have resubmitted the easy and liympanda has submitted it.

13:56 - so I would guess the following should work: we iterate over possible value for g=gcd(x,y). When that value is fixed, we get x^2+y^2=n+2*g-1. We substitute x=g*p, y=g*q, then we get p^2+q^2=(n+2*g-1)/(g*g). So we need to check if there's a sum of squares of relatively prime numbers equal to the given number. And for that, we can just do bruteforce since we'll need to check sqrt(n) values for g=1, sqrt(n)/2 values for g=2, and so on, for the total of sqrt(n)*log(n) values. This should run fast enough.

13:56 - dzhulgakov seems to have a working solution in terms of time limit, but it doesn't pass the examples.

13:57 - meanwhile Louty submits the easy, and everyone is working on the medium, including bmerry who didn't submit the easy.

13:58 - dzhulgakov fixes his bug, but still unsure about TL. ACRush has already submitted some time ago.

14:00 - rng_58 submits the medium, too.

14:01 - dzhulgakov speeds up his code from about 1s to about 0.2s, but seems to give WA now. He's looking into STL docs now.

14:03 - rng_58 have opened the hard, dzhulgakov has found his bug (int instead of long long). Now his sol works in 0.2s and seems correct. Going to submit?

14:04 - dzhulgakov submits.

14:06 - the hard problem: you are given a 44x44 rectangle of 'R', 'G' and 'B' characters. You consider all possible ways to take two (possibly intersecting or partially lying outside) kxk squares, then take all of the jewels inside at least one of those squares, then discard some of them and place all the rest in one chain. How many different chains can you get, modulo 10^9+9, if two chains that are reverses from each other are considered equal?

14:11 - the obvious idea for the first step is considering all pairs of two squares, and recording how many Rs, Gs and Bs are there in their union. This can be checked in O(1) for each pair if we precalculate partial sums of upper-left rectangles.

14:11 - meanwhile, liympanda and PaulJefferys also submit the medium.

14:13 - ACRush still didn't open the hard, he's checking his solution for the medium. People watching his screen say he didn't find any bug, he's just checking it very thoroughly. Maybe he knows about a problem but is not sure if it can be triggered by the possible testcases?

14:15 - then what we can do is iterate over possible values of how many Rs and Gs are there going to be in the chain. For each such pair, we calculate the largest amount of Bs possible, and now we need to calculate the number of chains with this many Rs and Gs and at most this number of Bs in O(1).

14:17 - if we forget about "up to reverse" condition, the formula is quite straightforward: ((R+G) choose R) times (1+((R+G+1) choose 1) + ((R+G+2) choose 2)+...+(R+G+B) choose B). How do we find that in O(1)?

14:19 - I guess we can just precalculate that for all values of R+G (which are up to 2*44*44 in total) and B (up to 44*44), that should be fast enough. So the only thing to deal with is "up to reverse" condition.

14:24 - And this is dealt with as usually by finding how many stay the same up to reverse (this means looking at a sequence of length n/2, with probably special casing for the middle cell when n is even), and then dividing all the remaining amount by 2 since all others split into pairs. answer_up_to_reverse=(answer+answer_thats_same_as_reverse)/2. This can also be viewed as the Burnside's lemma application.

14:24 - meanwhile, ACRush has resubmitted the easy which drops him to 5th in the current standings. Everyone has submitted the medium, bmerry still didn't submit the easy. rng_58 is first, dzhulgakov is second.

14:29 - now that we seem to have solutions for all problems, let's just watch what the contestants do.

14:31 - ACRush, rng_58 and bmerry seem to be coding something similar to the above solution. Most others are thinking. dzhulgakov is rechecking his easy solution.

14:33 - or maybe he's not? he has the easy code open, but maybe he's thinking about the hard on paper?

14:37 - some discussion of the easy between the spectators suggest it's quite hard to do it by considering all cases - the solution suggested by syg96 above (and dzhulgakov has implemented something very similar) is indeed the way to go.

14:41 - rng_58 is already testing his hard, not sure if it's the version without "up to reverse" or the final one.

14:42 - rng_58 has a bug in counting the number of Rs, Gs and Bs: x and y swapped in one place.

14:43 - liympanda resubmits the easy as well! This is going to be a hell of a challenge phase.

14:45 - everyone except dzhulgakov and liympanda is working on the hard, but it seems like not many are close enough to submit in 10 minutes.

14:47 - Louty gives up. bmerry, ACRush and rng_58 are all debugging one different example cases. Paul and Klinck are still coding.

14:48 - Klinck gives up as well, tests his medium. Maybe giving up now will actually prove crucial for the challenge phase?

14:49 - or maybe not! rng_58 passes all example cases, tests for TL...

14:51 - and he submits!

14:52 - and ACRush submits, too. He's above but the difference is less than 5 points. I guess challenges WILL be important, just not for everybody :)

14:54 - ACRush still looking at his hard - this is probably a bad idea. Should be working on challenge cases instead. Nobody else submits, the coding is over.

14:56 - PaulJefferys is preparing a large random testcase for ACRush and rng_58's hards. We'll see if those seem to be correct at the very beginning at the challenge phase. That will probably affect the further challenging strategy of everyone.

15:00 - most people went for 250s, bmerry is reading 500s. Paul decided not to blind-challenge the hards.

15:01 - ACRush successfully challenges Louty's solution!

15:02 - and there goes dzhulgakov's easy.

15:04 - the challenge case for both is "10, 10, 5, 4, 1" with the correct answer of 25. The trick here is that one needs to win the first two and draw the FOURTH in the worst case, while dzhulgakov only considered winning some largest ones and drawing some next largest ones.

15:08 - just one minute left, still no challenges on the hards. C'mon, Paul!

15:10 - Paul looked at the hards but decided not to challenge. ACRush got -25 in the final seconds.

15:10 - let's wait for the systests...

15:16 - ACRush seems to be pretty confident in his solutions. His only concern is the doubles he uses in the 500.

15:17 - ACRush has also told me that he didn't test his 1000 at all except the examples, but he doesn't think there'll be any TL issues.

15:18 - I think I will take a break now before the results are announced.

16:00 - rng_58 is the champion! ACRush's hard, Paul's easy and liympanda's medium failed, all others passed.

16:01 - wata is the Marathon champion! chokudai is second, doudouille is third. Go Japan!

16:03 - as expected, Margarita is the Mod Dash champion!

16:05 - and djackmania is the Studio champion! I think that's it for today's live coverage. However, please check this blog later for some videos of the event.

Thursday, October 14, 2010

TopCoder Open 2010 Algorithm Wildcard Round

Here will go the comments for Algorithm Wildcard Round of TopCoder Open 2010.

18:22 - it's very quiet in the arena, since most people who are not competing went to enjoy the fabulous Las Vegas (or just went to their rooms to get some sleep?). Most of algorithm competitors are here, though, even the ones who're not competing.

18:27 - I'm sitting in front of dzhulgakov's and syg96's screens. Hope that those two won't disappoint us, as rng_58 did in the semifinal.

18:30 - go!

18:30 - the easy problem: you start with N distinct numbers, and then can swap two adjacent numbers M times. How many different sequences can you get?

18:31 - N and M are up to 2000.

18:32 - so if a sequence X has Y inversions (pairs of indices such that the numbers at these indices are not in the same order in the original sequence), then we can get it using Y, Y+2, Y+4, ... swaps. The problem is thus reduced to counting sequences which have at most M inversions, and have the same amount of inversions modulo 2. I think this is done via quite simple DP - if you know the first number, you know how many inversions it creates, and then continue with the rest. This should work in O(2000^2).

18:35 - everybody is doing the easy problem, no variability from the contestants this time.

18:37 - PaulJefferys has submitted and then found out that his code TLs on (2000, 2000) :( Thanks Rustyoldman for the pointer!

18:40 - oh, AS1_PML30 has pointed out that my above solutions works in O(2000^3).

18:40 - dzhulgakov submits! And the DP is not so difficult to code in O(2000^2) by noticing that we can use accumulated sums to process each state in O(1).

18:42 - the medium problem: you're given at most 50 cards of form "+2", "*3". You shuffle then randomly, and then lay them down next to "0" to get an expression like "0+1-2*3", which is equal to -5. What is the expected value of this expression?

18:44 - first, we can notice that the order of "+" and "-" cards is not important. We can imagine the problem as starting with one "0" card and several "+X" and "-Y" cards, and then we need to place all "*Z" cards next to one of the starting cards, and the order of doing so for each starting card matters.

18:47 - maybe one can just do DP on the number of remaining cards of each type ("*1", "*2", ..., "*9", "*0")? That should be at most 5^10 states, which is about 10 million, but the transition will be quite slow.

18:54 - or maybe we can rely on the fact that we only need to find the expected value somehow? People seem to be at a loss around here.

18:54 - it turns out it's just me who's at a loss. Let rng_58 explain.

19:00 - still discussing the solution...

19:09 - so we need to calculate the expected value for each additive term, then sum those up. That means we should calculate just one number that depends on the multipliers: the expected multiplier for each additive term. At the simplest, we could iterate over all possible at most 5^10 compositions of a multiplier and average those with appropriate weights (some factorials multiplied and divided :)).

19:11 - bmerry suggests we can do it even faster: let's add multipliers one by one, and keep track of the expected value of the product if we have exactly k multipliers for each k, and the number of ways to get exactly k multipliers. Then this will be just something like O(50^2).

19:13 - another spectator reminder from Rustyoldman: everybody had submitted the easy and everyone is doing the medium... Not anymore! grotmol has opened the hard!

19:15 - here's the hard problem: how many ways are there to select some cells in a H times W rectangle such that no cells in one row are closer than K from each other?

19:16 - and the juicy part is that H, W and K are up to 1 billion!

19:19 - one simple part of solution by Onufry: we can forget about H and just take the answer to the power of H and subtract 1 in the end.

19:21 - another idea is that we need to calculate the number modulo 100003 which is a relatively small prime number.

19:25 - people are thinking about the hard problem. Most contestants are working on the medium, still no submissions. Two have opened the hard.

19:26 - dzulgakov submits the medium! I'm not sure what exactly his solution does, but it separates the additive and multipliers, and does some factorials afterwards :) He passed all examples.

19:32 - four people have submitted the medium now. The communal wisdom suggests that if we fix the number r of rooks in one row, then the number of ways is (W-K*(r-1)) choose r. Where do we go from here?

19:34 - and that's where the small prime number calls into play! In order to calculate a choose b modulo p, we can write both a and b in p-ary system, calculate digit-by-digit "choose" numbers, and multiply them. And to calculate those, we can pre-calculate all factorials modulo p and their inverses.

19:39 - so if the number r is relatively small, say, up to 10 million, we can do this. And when it's more than that, it means K is not more than 100 and thus we can do the usual matrix multiplication to find the number of ways (using recurrence f(n)=f(n-1)+f(n-K)).

19:41 - this solution was brought to you by ploh, Onufry, bmerry and jdmetz :)

19:44 - bmerry suggests we can avoid the matrix multiplication by grouping the r's which are the same modulo p and applying the above theorem about the number of combinations to each of those sums.

19:47 - nobody is writing anything that resembles a solution for the hard. Paul has calculated inverses modulo p, gmark has found the answers for small parameters and is staring at them. Unless we're missing a really simple solution, no hard will be solved in this round.

19:50 - izulin and grotmol are still working on their mediums, all others either do nothing or recheck their solutions.

19:52 - I believe there were no submissions in the past 30 minutes. dzhulgakov, Vasyl, syg96 and PaulJefferys have two problems (in this order), tomek, izulin, gmark and grotmol have one problem (in this order). Each of the groups of four is separated by less than 50 points, so any challenge in the first group brings the challenger to the top (and probably to the finals :)).

19:55 - The coding is over. I wonder what is the best strategy for the challenge phase. The medium has expected values so I bet it's pretty hard to fail the systests if you pass examples. Maybe when all numbers are multipliers, or when there are no multipliers at all? For the easy I can only imagine the tricky case when M is more than N*(N-1)/2.

20:01 - tomek tried the maxtest against gmark.

20:03 - the challenge phase is pretty calm, most people take a long time to read one solution.

20:05 - but just one challenge from syg96 or PaulJefferys can change who goes to the finals!

20:10 - a desperate last second challenge by gmark, and that's it. Waiting for the systests!

20:18 - most people agree that probably no solution will fail. More interestingly, Ivan Metelsky has confirmed that our solution for the hard is the expected one!

20:23 - Vasyl's and syg96's mediums have failed. dzhulgakov and Paul go to the finals! Stay tuned tomorrow for the coverage of the finals.

20:27 - here's the reason for the failure of the mediums: if you add up all additive terms in advance, you pass. But if you first multiply them and then add up many numbers, and when the expected answer is ZERO, but the multipliers are big - you're doomed, since small error relative to the product of the multiplier becomes big absolute error. That's a cool bug to fail to :)

Wednesday, October 13, 2010

TCO 2010 Algorithm Semifinal 2

Here will go the comments for Algorithm Semifinal 2 of TopCoder Open 2010.

12:56 - announcement complete, 3 minutes before the start. Will be looking at rng_58's screen during the first minutes.

13:00 - Into the fray!

13:01 - the easy problem: given several points on the plane, what is the maximal possible f(X) over strings X of the given length L consisting of "LRUD" characters. f(X) is defined as the probability of a robot starting at (0,0) will end at one of the given points if it performs each command with probability 1/2.

13:02 - L is up to 50, the number of locations is also up to 50. Thinking...

13:07 - rng_58 has started coding. It seems that the solution is quite straightforward: iterate over the number of "L", "R", "U" moves - the number of "D" moves is the rest. Then we need add up the probabilities of hitting each cell, which can be calculated using pre-calculated arrays of "what is the probability to get number x if we sum a (-1)'s and b (1)'s, each with probability 1/2". We need O(50^3) for calculating these arrays, and then O(50^3) possible strings will take just O(50) time each, which should be fast enough.

13:09 - rng_58 seems to have coded exactly this, probably has an off-by-one error somewhere.

13:11 - found the bug, submitted! ACRush and PaulJefferys have submitted as well. rng_58 is testing the maxtest and verifying the code, not moving on to the next problem yet.

13:12 - here goes another problem (not sure if it's medium or hard): given an even amount (up to 50) of boxes with given amount of gold and cost for each, you need to split them into two halves such that each half has the same (n/2) amount of boxes, and the sum of average costs per gram for the two halves is the minimum possible.

13:14 - the amounts and costs are up to 500.

13:27 - Suppose we fix the total weight of all boxes in one half. Then we sum two fractions where the sum of numerators is constant, and the denominators are given. It's obvious that to minimize this sum, we need to minimize the numerator of the fraction with the lower denominator. That leads us to the following DP: out of first a boxes, we took b boxes with total weight w - what is the lowest total cost to achieve that (we can get the highest by looking at the complement)? This should have running time of 50*50*50*500 which seems too big. rng_58 seems to have coded and submitted this. He hasn't tested it on maxtest yet.

13:28 - he's ran the maxtest, and the solution is very fast. Why?

13:29 - probably since 50 should actually be 25 in the above formula.

13:29 - rng_58 still doesn't feel confident enough to go to the hard. Let's take a look at the hard ourselves.

13:30 - the problem statement of the hard is amazingly simple. Given three arrays A, B and C of at most 34 elements each, find the number of ways to choose at least k indices such that if we sum the values of A, B and C for the chosen indices, the A's sum is the largest.

13:32 - as dzhulgakov correctly suggests, 34 suggests this will be meet-in-the-middle :)

13:33 - rng_58 is still thinking about this, and so are we.

13:35 - there's a cool constraint that k isn't more than 7, so all sufficiently large sets are allowed. Inclusion-exclusion? We're at a crossroads :)

13:36 - the numbers are up to a billion, so no help here. Let's forget about k first and solve the problem about the total number of ways.

13:43 - together with Dmytro, we seem to have solved the one with no k: if we build a vector (A[i]-B[i], A[i]-C[i]) for each i, we need to calculate the number of sets of vectors such that their sum is in the 1st quadrant (has both coordinates positive). Suppose we split all vectors in two halves, and calculated all possible sums for each half. Then we iterate over the sums of the first half in the order of increasing x-coordinate, and build a range-sum tree by y-coordinate of the vectors from the second half that add up to a positive x-coordinate with the ones we've chosen from the first set. Then we need one range-sum query to find the number of elements in the second half that will pair with the given element from the first half.

13:50 - and now let's remember about k. Since k is small, we can instead find the number of sets with less than k elements, and subtract it from the overall count. And in order to do that, we can modify the above meet-in-the-middle algorithm so that splits the sums for each half into buckets with the given amount of indices, and... Wait! Since 34*33*32*31*30*29/1*2*3*4*5*6=1344904, we can just iterate over all subsets of at most k-1 elements and verify the condition for them. So this part of the problem is very easy :)

13:53 - now that we're done with the problems, let's take a walk to other people's monitors and discuss the submitted solutions.

13:54 - ACRush was struggling with TL in his hard solution, and now he's getting WA on one of the examples.

13:55 - People have pointed out that the range-sum tree is tricky here since the values are pretty huge, so you need to "compress" the indices and be very careful.

13:56 - there's a Studio competitors announcement during the round over the loudspeakers. That should be extremely annoying to the contestants.

13:57 - ACRush has fixed the bug and submitted! (or maybe it was not a bug but just some debugging run?)

13:59 - RAD seems to be getting memory limit in the medium. No wonder, since it seems to be a requirement to do the DP remembering only 2 outermost rows instead of all of them :)

14:00 - liympanda is also debugging his hard problem, but he has a binary search inside his solution for some reason. Maybe it's just for compressing the coordinates?

14:01 - no, he's using it to calculate the answer. That looks strange.

14:03 - griffon is still struggling with the easy. That strategy doesn't work, as I've rigorously proven in the first round :)

14:04 - PaulJefferys seems to be generating all sums, so that should be the part of meet-in-the-middle. He still has a long way to go.

14:05 - ACRush seems to be debugging his hard on a testcase of his own. I'm not sure if he's just testing it or if he's found some issue.

14:08 - Onufry submits the hard! His solution has very funny 6 nested loops in the end to iterate over all combinations to take the k parameter into account.

14:10 - going back to rng_58 - he seems to have got the solution ready, was testing it for some time, and finally submits!

14:12 - rng_58 prepares a large testcase and tests his solution...

14:13 - works in 0.229s!

14:14 - tomekkulczynski seems to have the correct idea in the hard as well, and he's also splitting the sums into buckets as we've suggested above before we saw the easier approach for the k part.

14:15 - 9 minutes to go in the round. ACRush and rng_58 have three, tomek, Paul and gmark have the first two, Onufry had only the hard, griffon has nothing, all others have only the easy.

14:17 - wata is doing the hard, and seems to work on some kind of meet-in-the-middle. However, the past 24-hour marathon seems to have taken its toll and he's writing code quite slowly. Somehow ACRush doesn't seem affected :)

14:19 - it's a bit sad that we can't view the competitors' code now, as we could endlessly speculate about possible bugs. Well, we'll have to watch them during the challenge phase and do our best!

14:21 - meanwhile, liympanda submits the hard.

14:22 - PaulJefferys is reading up some STL docs. I'm guessing he ran into problems with the range-sum tree. He seems to have given up.

14:24 - most people seem to be debugging frantically in the last minute. Onufry still can't get his easy to pass the examples. izulin submits the medium. A good challenge target?

14:27 - the problems all seem to be more difficult on the time limit side and not on the correctness side; so I'd expect all challenges, if any, happen because of time limits.

14:31 - izulin's medium is downed by RAD. ACRush has looked at it before that but decided not to challenge.

14:32 - rng_58 seems to be reading the chat history to verify if izulin's easy has already been challenged (I'd guess for time limit?). Hey, you can use right click --> history for that!

14:33 - another good shot by RAD! Onufry goes down. Let's look at RAD's screen :)

14:34 - a point for RAD - he first checks the history to make sure the problem hasn't already been challenged, as in that cases chances are probably much lower. That's what I should be doing!

14:35 - unsuccessful challenge on liympanda. RAD's challenge for the hard seems to be just large and random. So there's high chance liympanda will pass the systest.

14:36 - judging by RAD's successes, we'll see quite a few solutions failing systests as well.

14:37 - ACRush is looking at tomek's 500 very carefully.

14:39 - another two unsuccessful challenges for RAD. That probably makes sense since he's (almost?) the lowest scorer on the easy.

14:39 - tomek brings Paul's easy down! Yes, I expect the systests to be quite eventful.

14:44 - Paul says his easy has failed because the coordinates in the input can be up to 100, even though the length of the string is at most 50, so he only had a small array and probably died because of strange out-of-bounds access results.

14:46 - RAD told that his goal was at least +125, so he just challenged almost blindly. Worked out for +100, but not anymore :(

14:49 - everybody waiting for the results. It seems that ACRush and rng_58 are pretty solid, and will get into the finals even with one problem failing.

14:50 - actually that's not true. rng_58 can be fourth if his hard fails.

14:51 - so the systests only brought w_'s easy down. The finalists are ACRush, rng_58 and liympanda. tomekkulczynski, gmark, PaulJefferys and izulin go to the wildcard.

15:00 - Ivan Metelsky the Algorithm Coordinator told that the results for the second semifinal are as expected, while they wanted much more people to solve the medium in the first semifinal. Stupid me for not opening it :(

15:02 - speaking about the hard problem of the first semifinal - people have rightly pointed out that the inner DP actually runs in O(25*25), for the total running time of 1000*25^3, which is fast enough. The author of the problem can solve the problem in 1000*25^2 - so here's some homework for you :)

15:03 - that's about it for the semifinals! Stay tuned later today for the coverage of the Wildcard round that starts in 3 hours 30 minutes.

TCO 2010

The bad news (for me :)) is that I've crashed out of TopCoder Open 2010. Somehow got stuck in the easy problem for most of the contest, then somehow decided to open the hard instead of the medium and didn't manage to solve it in time.

Here's the solution for the hard that I had in mind at the end of the contest:
  • First, iterate over all possible cutoff scores between 0 and 1000, and over lastCutoff (last person to score exactly cutoff points), between 1 and 25.
  • For each such pair, calculate two arrays using DP:
    • What is the probability that out of first x persons, exactly k will advance (that means, score above cutoff, or equal to cutoff if their number is less than or equal to lastCutoff).
    • The same for last x persons.
  • In calculating those arrays, we make use of the fact that all scores are independent, so the DP is quite straightforward; we need to remember that we've already fixed the score of person lastCutoff to cutoff (but this still has probability of 1/n, not 1).
  • After we've got those arrays, let's calculate the probability of each person a advancing, and being b-th highest ranked person in the finals. This is quite easy now: we need to multiply the probabilities of:
    • b-1 people to advance from the first a-1 persons
    • k-b-1 people to advance from the last totalPersons-a persons
    • a-th person advancing. Here we need to keep in mind that if a is equal to lastCutoff, this should be the fixed 1/n probability since we're only interested in the case where he scores cutoff points.
  • And finally we accumulate the above probabilities by semifinal number, to get the answer for the problem.
This has complexity of O(1000*25*(25*25+timeOfInnerDP)). The obvious implementation of the inner DP runs in O(25*25*25), processing each state in O(25) operations. 1000*25^4=400M, so this will probably already run in time (since the number of states is not 25*25 but 25*25/2, and so on), but I have the feeling that since the inner DPs are very similar, we can try to run all of them in such a way that each takes just O(25*25). Ideas?. The inner DP actually runs in O(25*25) since each state is processed in O(1), so the total runtime is good enough. I was so close during the round!

The good news is that I'll now have a lot of free time at the TCO and will do live commentary on all 3 remaining Algorithm rounds: Semifinal 2, Wildcard and Finals. I will post the commentary both here and on the TopCoder blog. Stay tuned :)

Friday, June 11, 2010

Can you kill backtracking?

Here's a problem that has appeared at a recent contest: We are given a 7 times 7 field where some of the cells have numbers between 0 and 8, inclusive, and all remaining cells have dots. We want to find such allocation of exactly 10 stars to the cells with dots, at most one star per cell, so that each cell with a number has exactly that number of stars in its 8 adjacent cells. Moreover, we will only consider such 7 times 7 fields where such allocation of 10 stars exists and is unique.

Can you create a testcase for this problem that will kill backtracking solutions? It seems pretty hard to do at such small field, but it is possible!

Here are the official rules of this challenge: http://killbacktrack.appspot.com/rules.jsp.
And here's the website where you can test your testcases, which also keeps a scoreboard: http://killbacktrack.appspot.com/.

I've got a pretty good testcase, but I don't want to reveal it yet :) Please tell if there's some problem or possible improvement to the website.

Friday, February 26, 2010

Codeforces

My friends from Saratov are building a new system for hosting contests and beyond. They aim for everything there to be available both in English and in Russian. Today was its second beta-test, which was a contest under ACM ICPC rules for 2 hours with 3 problems. Results: http://codeforces.com/contest/2/standings. Problems: http://codeforces.com/contest/2.

Problem B is very nice. I scratched my head for a long time before getting it, yet the idea is very simple and beautiful. I won't spoil it yet - I hope you'll enjoy solving it as well :)

Problem C turned out to be a tough implementation problem for me (I solved a system of two quadratic equations on 2 variables by eliminating the squares to get a linear equation, substituting one variable from that linear equation into one of quadratic equations, and solving the result. This required a lot of care about precision issues and corner cases, and took an hour). However, I have a feeling that it is doable by some binary/ternary search. Any ideas?

Tuesday, February 23, 2010

World Finals Recap

Here're the comments that I've posted during the ACM ICPC 2010 World Finals in Harbin.

Read more

09:08 - Everyone is still waiting on the spectator balcony, teams
are still kept away from the competition room. Problem statements have just
been distributed to team workstations.

09:20 - Still nothing happens. The competition is more likely to start at 10 than at 9:30.

09:23 - Teams are starting to come in.

09:26 - spectators are cheering all the time as the teams take their seats.

09:32 - all teams are at their workstations.

09:35 - Bill promises to start the contest at 10am sharp.

09:39 - if you have questions for me, please post them at http://forums.topcoder.com/?module=Thread&threadID=664073. I'll check that thread from time to time.

09:43 - they're testing the new procedure for photographing the team with their first baloon and with the first baloon for any particular problem.

09:44 - by the way, is the video stream live yet? Maybe it's unnecessary for me to explain what's going on? Please answer at the forum thread above.

09:54 - I've moved to the watching auditorium so I'm watching the same video stream as everybody on the Internet (hopefully) does now.

09:58 - The contest has started!

10:00 - 11 problems. They promised to get the problems to spectators in several minutes.

10:04 - we got the problems. Hopefully they'll be available on the Internet soon as well.

10:06 - problem C is: given a n times m map, with at most w 1-cell-high horizontal walls, find out the number of cells from which you can't get to the top right corner by moving top and right. The coordinates are up to a million, w is up to a thousand. The solution seems obvious - first you compress the coordinates (find all distinct ones) to get at most 1000 distinct ones, and then do a simple DP.

10:11 - problem E is: given a 20 times 9 field with some cells occupied by rocks, find the longest path from top left corner to bottom right corner that doesn't touch itself. 20 times 9 dimensions point us to DP on subsets or something similar. I believe something like that should work: go through the table row-by-row, and keep which cells in the bottom row are connected to which, and which are not occupied at all. Quite hard to implement correctly, but the algorithm is not very complicated.

10:17 - problem G is: given at most 100 points on a plan with distinct x-coordinates, find the shortest cycle that passes through each point exactly once, goes from the leftmost point always to the right until it reaches the rightmost point, then goes always to the left until it gets back to the leftmost point. Additionally, two points are given such that the the path from left to right contains the first point, and the path from right to left contains the second point. This seems to be a very simple DP: after processing the last k points, and with the first path ending in point a and the second path ending in point b, what is the smallest total length to achieve that? This is O(n^2) states, transitions in O(n). We deal with the two special points by forcing the first path to contain the first one, and the second path contain the second one.

10:23 - problem J is: given a rectangle, is it possible to split it by repeatedly splitting it into two parts along a line parallel to a side into several rectangles with the given areas. Maybe we can solve that by trying to do a DP over (rectangle, part sizes multiset)? Since the area of the rectangle should match the sum of part sizes, it seems that the number of states will be reasonably small.

10:24 - Belarusian State University has solved problem J. Many wrong attempts on other problems.

10:29 - Cornell has solved G. I will probably stop commenting on most submissions since the scoreboard is readily available anyway.

10:33 - Problem D has been solved as well. The statement is: given a tree of castles, you need to capture them all. For each castle, you know how many soldiers you need to capture it, and how many will disappear during the process, and how many you should leave in the castle afterwards. It seems that you should just add up the last two numbers. Additionally, you may traverse each road at most twice, which essentially means you should do a normal tree traversal, and the only freedom you have is to choose the order of visiting the children of each vertex in the tree. So the solution seems to be: first, check all possible poits to start the attack - the root of the tree. Then, for each vertex we calculate the best way to capture the subtree rooted at each of its children, and then figure out the best order to visit that subtrees given the total number of soldiers required to capture each subtree X and the total number of soldiers that must be left in that subtree Y. Figuring out the order seems to be a classical problem after that - we should sort them by something like X-Y :)

10:50 - Problem I. When we split the path into four parts, each part has at most 16 cells, so we can just build all possible paths for each part (there're at most 3^16 of them, and in reality much less since we can't self-intersect, can't go outside the labyrinth and should reach the specific cell - so I hope there'll be just several thousands of those). Then, we group the paths by the set of cells they're visiting. Then, we combine the first part and the second part in all possible ways; then we combine the third and the fourth in all possible ways; then we use meet-in-the-middle to combine the complete path (for each subset covered by first two parts, we check if the complement can be covered by the last two parts).

11:12 - Problem B seems to be mostly about accuracy, as a lot of wrong submissions are showing, and many good teams are struggling with it. Generally, the first step of the solution is to separate narrow bars from large bars. To do that, let's take the smallest bar, and then all bars that are less than 50% larger than it should be narrow, and all others should be wide. After that, we should just check all given conditions carefully, and check that the lengths of the all narrow bars fall withing the specified plus-minus 5% of some number, the same for all wide bars.

11:21 - Problem A seems to be a not very complicated simulation. I expect some teams to attempt it in the first two-three hours. There's not much
I can say about the solution - you should just do what's written in the problem
statement, the constraints are reasonably small.

11:23 - the scoreboard is at Shanghai 4, Stanford and Moscow 3, others 2 and less.

11:31 - ACRush correctly suggests that the answer for problem I should be at most about 10000 (even without the constraints for 3 squares, it's in the millions), so pretty much any backtracking-style solution should work in time.

11:41 - Problem H doesn't have constraints, so it's unclear what is
the expected running time. Generally, the approach for this problem should
probably be the same as for the 2-dimensional case of this problem, which is
to go over the vertices from bottom to top, merging the lakes until they reach a point where the water can pour out of the lake. In 3-D this will require a lot of code, like finding the intersection of a horizional plane with the 3-D triangle, but the logical 'pouring' part should be the same as in 2-D case.

11:47 - ACRush points out that in problem B there's no example on the case where the barcode is reversed, which is mentioned only once in the problem statement and is easy to miss. Maybe that's why the teams are struggling with this problem.

11:56 - Problem F. The first step is to notice that since we need to find the total length, we can solve the problem separately for each triangle. Within each triangle, the countour lines are parallel to each other, so we should find one of them and the distance between them, and then carefully find the total length as the sum of two arithmetic progressions. This seems fairly easy to implement, but one has to be careful with the boundaries. There're at most 10K triangles, so we can afford some processing for each of them.

12:04 - Problem K seems to be straightforward from the algorithmic side as well. We try to place it on each side, and just have to carefully calculate
the position of the center of mass and check for stability. The geometric part
is quite standard 3-D routines (for rotating) and 2-D routines (for checking
that projection of the center of mass is at least 0.2 away from the border of the base). The most complicated part seems to be dealing with various possible 'bases' for the paperweight when the figure is not convex.

12:09 - so that's all for my guesses about each of the problems. It seems that there's no really complicated algorithmic background to any of them, and they mostly require careful implementation. Two of them are 3-D geometry, five of them
are DP, the rest are mostly straightforward but tricky implementations.

12:12 - I'm going for lunch now. Please ask your questions on the TopCoder forum thread I've liked to above :)

13:33 - The contest is going, there seem to be 5 problems that are solved by the top teams, and then there are 4 more problems that are solved by at least one team. The contest can still go to any of the top teams.

13:35 - Regarding competing together with ACRush - yeah, if they make an online contest next year, that should be a good thing to try :)

13:42 - They've finally enabled WiFi for coaches!

13:42 - From the 6 "remaining" problems, I think the ordering from the easiest to the hardest is K-F-A-I-E-H.

13:45 - They said they won't be distributing the balloons in the last hour, so I won't have any insider information to share :(

14:00 - So no more updates on accepted/rejected on the scoreboard. They're going to show the submitted runs, though.

14:03 - They told at the ICPC Live stream that SJTU have submitted their code for problem K on problem H, so it was either a mistake or they wanted to mislead other teams.

14:31 - Moscow State University and Warsaw University teams were happy
when they saw the outcomes of their respective runs, so it seems that they have
7 and 6 solved problems now, respectively.

15:16 - Results: Spb IFMO - 22nd. KTH - 12th. Fudan - 11th. Zhongshan - 10th. SpbSU - 9th. Warsaw - 8th. Saratov - 7th. Tsinghua - 6th. Petrozavodsk - 5th. KNU - 4th. Taiwan - 3rd. MSU - 2nd. SJTU - 1st.

16:23 - I've tried to upload a photo of the final scoreboard, but it turns out it is already available at http://uaimages.com/viewer.php?id=383434bord%201-21%20v2.png.

16:24 - Generally, SJTU obviously are very happy with their results, with many other medalists being disappointed they couldn't get better. For example the Bronze medal SpbSU have the same penalty time as the Silver medal Warsaw.

16:25 - Also, it seems that everyone here believes that Ural SU will also get a bronze medal for their 13th place since they're the only team except the first 12 to solve at least 6 problems.

16:25 - That's pretty much it, I think I won't be posting frequent updates here anymore, but may post some more news if I find them.

16:32 - The Moscow State University was trying to solve problem I as did the SJTU team, and both had a complete solution but failed to debug it. Spb IFMO team tried to solve two problems in the end and couldn't get any of them.

17:22 - The awards ceremony is going on. As Oleg Hristenko has rightly pointed out, we are still to announce some regional champions, and the results of the Russian teams that didn't make it to the first page of standings that was posted on the Internet.

18:00 - ICPC Challenge results: 4th place Saratov SU, 3rd place IME-USP, 2nd place Beijing Jiaotong, 1st place U of Canterbury.

18:24 - Tatiana Churina of Novosibirsk SU together with 6 other coaches (I'm being Russia-centric, I know :) got the Coaches' Award.

18:37 - Belarusian SU receives $1000 for getting the first accepted solution.

18:38 - Moscow SU (F), National U of Taiwan (I), NTU KPI (D), Kiev NU (B), Spb IFMO (C), SJTU (E), U of Wroclaw (K), U Maryland (H), Cornell U (G), Fudan U (A), Belarusian SU (J) are recognized for being the first to solve a particular problem.

18:48 - Africa and Middle East champion - British U in Egypt. Latin America champion - UF Pernambuco. North America champion - Stanford U. South Pacific champion - U of Western Australia.

18:54 - the medals are official! 4 gold + 5 silver (including SpbSU) + 4 bronze (including Ural SU). You know the rest, so I won't be posting the teams winning the medals again :)

19:02 - going to the stage with the Moscow SU team :)

Friday, February 5, 2010

ACM ICPC 2010 World Finals

I hope to be posting updates on the ACM ICPC 2010 World Finals here if I manage to get an internet connection in the Harbin Engineering University: http://77.41.63.3/icpc2010/finals.html.