Sunday, March 30, 2014

This week in competitive programming

TopCoder SRM 614 took place on Saturday (problems, results, top 5 on the left). The most interesting things happened with the hard problem. It went like this: you are given a NxM rectangular field wrapped like a torus - to the right of the last column is the first column and to the top of the last row is the first row. You start in cell (0, 0) and do one random step per second - you go one step to the right or one step to the top, each with probability 50%. What is the expected number of steps before reaching the given cell (X, Y) the first time? N and M are up to 100.

If you're logged into TopCoder, you can view the fastest solution for this problem, which required just 9 minutes to write (and could've probably been done even faster). It solves a natural system of equations: if we introduce a variable for the expected number of steps to reach the goal from each cell, we get a very simple set of equations: for each non-final cell, its variable is one plus half of sum of the variables of the two cells we can move to. This system has 10000 equations and 10000 variables, but each equation has just 3 variables in it. The code in question solves the system iteratively: start with all zeroes, then 7000 times update the value of each variable according to the corresponding equation using the current value of the other two variables.

There are different ways to organize such iteration. One way is to use "generations": we produce NxM new values based on NxM current values. Such approach is essentially equivalent to a DP which computes the expected value assuming we'll need at most 1, 2, ... steps to reach the goal. And that's why such approach doesn't work here - we'd need much more than 7000 steps to get a good approximation of the answer.

But the solution in question does something more tricky: it has just one set of NxM values, and updates them in-place, going in the decreasing-row then decreasing-column order. Such order means that when processing subsequent values, we'll be more likely to use the values that were already updated during the current iteration, and thus potentially "consider" more than 7000 steps: instead of processing just one step per one NxM iteration, we're processing something like O(M) horizontal steps and O(N) vertical steps, since we're tracing an entire part of the path that does not wrap around the torus. This part is a bit hand-wavy, but I hope it makes some sense (alternatively, please explain why it doesn't!). This speeds up the convergence about 50-100 times, and several hundred thousand steps  is already enough to get a good enough approximation.

During the round, the admins have apologized that such simple iterative solution worked, saying it was unintentional. But I think there's nothing to apologize for - this is actually a clever solution that relies on a tricky insight about the right order of processing of the cells, so kudos to everyone who came up with it!

Codeforces Round 239 took place on Sunday (problems, results, top 5 on the left). My solution for problem C relied on the fact that falling factorials, or falling factorial powers, adhere to the same binomial law as normal powers. More precisely, let xn = x*(x-1)*...*(x-n+1). Then (x+y)n = sum(C(n, k)*xk*yn-k). See this Wikipedia article and its outgoing links for more details. This and related identities allow to operate on polynomials expressed as a sum of falling factorial powers instead of the usual sum of powers in exactly the same way, and in this problem using such polynomials allowed to speed up the computations 100 times!

Thanks for reading, and see you next week!

P.S. I'd like my posts to be written in good English. But I'm afraid people with really good English don't think they are - please don't hesitate to point out grammar and other errors! I want to improve :)

Sunday, March 23, 2014

This week in competitive programming

The working week was free of programming contests - but come Friday evening, the competition was in full swing with TopCoder SRM 613 (problems, results, my screencast, top 5 on the left). The screencast is full of frustration with not being able to squeeze the 500 inside the time limit (my solution was about 2-3x too slow, although I believe the asymptotic complexity was similar to the correct solutions), and then the excitement of getting the 900 working with just 1 minute to go in the coding phase.

The 500 problem went like this: how many ordered n-tuples of numbers exist with each number between a and b, and all numbers in the n-tuple relatively prime as a set? a, b and n are up to 109, b-a is up to 105. There is, in some sense, only one possible approach: we have to use the inclusion-exclusion principle: first count all n-tuples, then subtract those where all numbers are divisible by 2, and so on. But there are several possible implementations of this approach, some faster than others :)

Codeforces carried the competitive spirit on with Codeforces Round 238 on Saturday evening (problems, results, top 5 on the left). al13n has won after a long break - in fact, his last win was one year minus one day ago :) As a result, he has jumped back into the top 10 by rating - congratulations! Komaki, on the other hand, showed some great consistency by getting into the top 5 both on TopCoder and on Codeforces.

And finally, the Open Cup returned on Sunday with the 11th round of this season (results, top 5 on the left). With about 3 months left before the World Finals in Ekaterinburg, it's most natural to view the OpenCup as the best available show of teams' strength. Not much has changed on this front, though: among the teams qualified for the World Finals, SPb SU 4 holds the lead, followed by Moscow SU Tapirs.

The Open Cup also gave rise to a nice strategy tip from my teammate Pavel Mavrin: https://twitter.com/pmavrin/status/447706027534721024

Thanks for reading, and see you next week!

Monday, March 17, 2014

This week in competitive programming

This week had two usual suspects: a TopCoder round and a Codeforces round. TopCoder SRM 612 (problems, results, top 5 on the left) featured a somewhat unexpected 450-pointer: you were given a set of cells on the infinite grid, and were asked to find another set of cells that has the same number of cells in each row and in each column as the original set, but has the smallest possible intersection with the original set. I'm calling it unexpected because the only solution I'm aware of involves min-cost-max-flow, and that seems pretty tough for 450 points. Is some simple approach possible here?

Codeforces Round 236 (problems, results, top 5 on the left) had the following problem as the hardest: you are given two trees, blue and red, with the same set of vertices. Then, we remove one edge from the blue tree, separating all vertices into two parts. There are some red edges connecting the parts - we remove all of them on the second step, separating the red tree into several parts. Now, there are some blue edges connecting the separate red parts - we remove all of them on the third step, and so on. You need to output which edges will be removed on each step, given the two trees and the first blue edge to remove. Trees have up to 200000 vertices.

The author's solution and both accepted solutions involve interval trees and require O(NlogN) memory. So here's my challenge: can you come up with a solution that doesn't use any complicated data structures and uses O(N) memory?

Thanks for reading, and see you next week!

Monday, March 10, 2014

This week in competitive programming

We start by discussing what was left from last week - the 10th Open Cup round (problems, results, top 5 on the left). The announcement of its results was delayed because of issues in the following problem (problem B in the statements): you are given the n integer lengths, and need to construct any convex polygon with the given side lengths. This is a nice, if standard, problem. The most common approach is to build a polygon that is inscribed in a circle, and find the radius of such circle using binary search (if all sides covered more than 2π, increase the radius, otherwise decrease it; handle the case where one side covers more than π carefully). The devil is in the details: as there were up to 105 sides of length up to 105, the radius of the circle had to be more than 109 which led to awful precision issues given that a precision of 10-5 was required in the output, and the jury checker program was not free of such issues, too. I think this is a common mistake that contest authors do: they set the problem's limits so that their solution barely runs in time/space/precision. Instead, when setting the limits, one should set them by taking into account both the correct solutions and possible wrong solutions. In this problem, I don't see any wrong solution of polynomial complexity, so having just 100 sides of length up to 100 would make the problem much better because the precision issues would go away and one would still need to come up with the binary search and properly handle the corner cases.

This week had just one contest: TopCoder SRM 611 (problems, results, top 5 on the left). I've skipped it, so I won't go into detail about its problems.

Instead, let's talk about last week's medium problem that I mentioned in my previous post: you are given several tasks, i-th task consuming ai energy when started and returning bi energy when completed (but energy is not allowed to become negative while performing a task), bi < ai. What is the maximum number of tasks you can perform given your starting energy?

Such problems are often solved by considering the following natural question: consider the final order in which we are solving the tasks. In order for the solution to be optimal, swapping two adjacent tasks in that order should not make it better (for some meaning of "better"). So let's see what happens when we swap two adjacent tasks in that order.

Let's say the first task consumes a1 energy and returns b1 energy, and the second task consumes a2 energy and returns b2 energy. For the first task to be doable, we need at least a1 energy, and the amount of energy will decrease by a1-b1. For the second task to be doable after the first task, we need to have at least a2+a1-b1 energy in the beginning (since a1-b1 was already spent on first task). If we do them in the reverse order, then by the same argument we see that we need to have at least a1+a2-b2 energy in the beginning. The a1+apart is the same in both constraints, so the only difference is -b1 vs -b2. We can see that if b1>=b2, then -b1<=-b2 and thus a2+a1-b1<=a1+a2-b2, and doing the second task after the first task has lower or equal requirement on the amount of starting energy than doing them in the reverse order, so it makes no sense to do them in the reverse order. That mean that we can sort the tasks in non-increasing order of bi, breaking ties arbitrarily, and then only consider solving them in the sorted order.

Having made this observation, the rest is relatively easy: we can do Dynamic Progamming using (number of tasks solved, current amount of energy) as the state.

This simple and somewhat obvious idea - to consider what happens when we swap two tasks - is often crucial to solving this kind of problem. Sometimes it leads to a solution outright, and sometimes it can give a clue to what the next step is.

Thanks for reading, and see you next week! Also, here's a sunny Moscow landscape I see - spring is here!


Tuesday, March 4, 2014

This week in competitive programming

There were seveal contests during the week, but in the end I only get to cover one in the weekly summary: TopCoder SRM 610 took place on Tueday (problems, results, top 5 on the left). It featured three somewhat standard problems - this didn't stop me from not solving one of them, of course. I think the most instructive problem was the medium: you are given several tasks, i-th task consuming ai energy when started and returning bi energy when completed (but energy is not allowed to become negative while performing a task), bi < ai. What is the maximum number of tasks you can perform given your starting energy?

This type of problem is often tricky to get right, but in many cases can be solved using a disciplined approach that I will try to describe in a future post.

There were also two Codeforces rounds during the week, and an Open Cup round on Sunday. Codeforces has unfortunately lost last month's data afterwards due to a hardware failure, so there's no way to look at the problems and results now. Open Cup results, on the other hand, are still not finalized because of precision issues in one of the problems, so I will have to cover them in the next week's summary - stay tuned!