Saturday, October 27, 2012

Petr Mitrichev Contest 10 solution ideas

Petr Mitrichev Contest 10 is over, congratulations to WJMZBMR, to an unnamed team from China, to tmt514, to UPC-1 team and to scottai1 for solving 4 problems, to flashmt for being the only one to solve problem D, and to hirosegolf for being the only one to solve problem F!

You can still solve the problems for practice at http://codeforces.com/gym/100110, and you can even start a virtual contest when you'll see that standings of other teams as they appeared at the particular time into the contest.

When making this contest, I've tried to add a flavor of exploration (as opposed to just using standard algorithms), since that actually resembles my work quite a bit. Please tell whether you liked that or not!

In particular, I like problem C the most, and I'm pretty sure it can be solved for 8x8, 10x10 or even 20x20 fields - but I don't know how. We can probably set up a challenge to solve it for 10x10. My feeling is that some statistical methods should be the most powerful here.

Here are the solution ideas. Those are not complete solutions, but I hope they can guide you in the right direction. Problem statements can be found at http://goo.gl/uIW7S. You can also find a video of myself doing analysis of this contest (without problem G) in Russian at http://youtu.be/FN6FRXiQ7JE. Please don't read below if you still want to try solving those problems yourself :)

Problem A. Asymmetric Art.
This problem can be solved with backtracking. The straightforward backtracking tries all possible subsets, but that's obviously too slow. We can do several optimizations: first, let's consider the numbers in increasing order, and when taking each number we can 'mark' all numbers that we can't take anymore because of this new number. This is not fast enough, but we can additionally truncate our search by saying 'if we've reached number x, then we'll take not more than answer[n-x] more numbers', where answer[y] is the number of items in the largest quasi-symmetric-triple-free subset for n=y. Then we can find all answer[] values in under one minute, and then send a program that has all answers hardcoded for judging.

Problem B. Lots of Combinations.
We need to do two things: find (n,k) modulo 10**10, and check if (n,k) is greater than or equal to 10**10. The second part is relatively easy - if k <= n-k, then we can incrementally compute (n,1), (n,2) and stop as soon as we exceed 10**10. For the first part, we notice that it's enough to compute the answer modulo 2**10 and modulo 5**10, and then use the Chinese remainder theorem. Computing the answer modulo 5**10 is done by first calculating the power of 5 in (n,k) separately, and then calculating (n,k) with all powers of 5 removed from all numbers using the fact that we can now use division and thus just need to calculate factorials (skipping numbers divisible by 5) modulo 5**10, and numbers modulo 5**10 is a periodic sequence.

Problem C. Curiosity.
I'm pretty sure there are lots of different approaches that work in this problem. The two main directions I'm aware of is finding large blocks of data without unknowns and combining them together, or using various dynamic programming approaches that consider all possibilities. My solution is of the second kind: assume we've chosen the 6x6 field. We can then use dynamic programming to find the minimum possible number of contradictions in the input data (places where we know the color of some cell but it's different from the actual cell we're on). Now I use simulated annealing to find the 6x6 field that has zero contradictions. Of course, since all input files are given to you, you can do this without any time or memory limits and then just submit the discovered answers.

Problem D. Domination.
Since one of the dimensions is small (up to 10), we can do dynamic programming using a cut in that dimension as our state. More specifically, suppose we have filled first k columns and first p cells in the (k+1)-th column. Then we're only interested what happens on the first p cells in the (k+1)-th column and the last (n-p) cells in the k-th column. Each cell can be in four states (has '1', has '2', has '0' and doesn't have an adjacent '2', has '0' and already has an adjacent '2' (let's call this state '3')), yielding 4**10=2**20 states, but it turns out we don't need all of them - some pairs of adjacent cell states are not useful: '21' and '11' can always be replaced by '23', and '20' is plain impossible. That brings the number of states down significantly, but the program is probably still too slow. The final touch is to notice that things become periodic if we fix n and increase m.

Problem E. Easy Learning.
This one probably has even more different solutions than C. There is a lot of theory for this kind of problem that I don't want to recite here, my solution was using gradient boosting.

Problem F. Hash.
There are two main approaches here. One is called Pollard's rho algorithm, and does not depend on the hash function much, the other actually uses the specific hash function we use. Consider values x_i = hash('1000 zeroes but i-th character is 1') - hash('1000 zeroes'). Then we need to find two disjoint subsets of x_i with the same sum. Let's do the following trick: sort all x_i, and take y_1=x_2-x_1, y_2=x_4-x_3, y_3=x_6-x_5, and so on. It's not hard to see that now we have 500 numbers (2 times less), but the numbers themselves are about 1000 times less on average, and we still need to find two disjoint subsets with equal sums. Repeating this several times gets us numbers that are so small that two of them are bound to be equal. This problem has some very tricky cases when b=2 - you should watch out for those!

Problem G. RLE Size.
Here, you just need to carefully consider all cases. Each block of consecutive '?' signs can be solved on paper, based on what's to the left of it and what's to the right, and then you can just implement the formulas you've discovered on paper in your program.

Problem H. Good Students and Bad Students.
This problem can be solved greedily. Let's go from the highest numbers to the lowest numbers, and gradually fill all groups. Whenever we encounter a student that wants to be in the upper half, we place it in the first upper half that still has empty slots, if any. Whenever we encounter a student that wants to be in the lower half, we place it in the first lower half that has the corresponding upper half completely filled, if any, and in the first free upper half slot otherwise. The proof is a bit tricky but doable.

Problem I. Tennis Scores.
This was the classical 'long statement, straightforward but long code' problem. You just had to carefully calculate the probabilities of game outcomes for each player's serve, then use those to calculate the probabilities of set outcomes, and then use those to calculate the probabilities of match outcomes. It's important to notice that set outcomes depend on who's serving first in the set.

Problem J. Three Squares.
This problem was supposed to be solved numerically. One step that you need to make to avoid falling into a trap is to realize that we're not always rotating all three squares by the same angle, however improbable that might look. Here's an example:


Then you could either just iterate over possible rotation angles with a reasonable step and check whether there's intersection (since all coordinates are integers, there's no possibility to create a particularly nasty case for that solution), or, if you want a more robust solution, you can do a search in the 3-D space of angles that repeatedly splits the search space in two, but then throw away certain branches when we can see that for all possible triples in that branch there's still an intersection.

1 comment:

  1. Thank you very much for such a nice  contest :)

    ReplyDelete