Thursday, September 24, 2015

A week with another 6

TopCoder SRM 668 was the first contest on the week in the early hours of Wednesday (problems, results, top 5 on the left). ACRush was the highest-rated competitor, most probably practicing before the upcoming final TCO elimination round. He has placed 6th, which is exactly the last qualifying spot in the TCO - but would he be able to repeat this performance in the TCO round? Read until the end of this post to find out :)

Enot was the winner of this SRM and the only contestant to solve all 3 problems - very impressive! This is his first SRM victory, maybe the streams have helped him improve?

Codeforces Round 320 (Bayan Thanks-Round) took place a few hours later (problems, results, top 5 on the left). Quite a few contestants have managed to solve 5 problems out of 6, so in order to win one also needed to find a few bugs in solutions of competitors. Um_nik and Egor have excelled on this front with +500 points each, but Um_nik had slightly more points from his solutions and has claimed the victory - congratulations!

Russian Code Cup 2015 Finals was one of the "major" tournaments of the year, with $4500 up for grabs for the first place (problems in Russian, results, top 5 on the left, broadcast with commentary in Russian). The problemsetters seem to have been worried about contestants finishing early, and made the last two problems just a tiny bit too hard - quite a few contestants were close on E, and I've found out after the analysis has been published that my solution on F was not that far from the correct one, but nobody could get either problem accepted.

Problem D was a nice logical puzzle: you are given 200000 points on the coordinate plane, and want to find two points such that their bounding box has a positive area, but does not contain any other points inside or on the border, or report that there's no such pair. After solving the "yes/no" part, don't forget to also think about finding the specific pair :)

And finally, TopCoder Open 2015 Round 3B determined the last 6 participants of one more major tournament (problems, results, top 6 on the left, my screencast, Egor's screencast). Scott_wu has won with comfortable margin, but the most exciting aspect for me as a viewer was Egor's climb into the qualifying zone during the challenge phase - here's the direct link into the critical moment of his screencast.

The medium problem was the nicest in my opinion. You were given 200000 cookies, with two players taking those cookies in turn one by one. Each cookie has some value for the first player, and some possibly different value for the second player. It is known that the second player always takes the cookie with highest value for him, and when there are several, with the highest value for the first player among those. The first player, on the other hand, is more flexible. He can take any cookie, and his goal is to maximize the total value of the cookies he gets in the end to him minus the total value of the cookies the second player gets in the end to the second player. What is the optimal strategy for the first player?

Now we know the 12 finalists of TopCoder Open 2015. There's a very heavy skew towards the ex-USSR this time:
Russian Federation - 7: PetrUdH-WiNGeRqwerty787788, kuniavskiEgorKankuroEndagorion
Belarus - 2: touristsubscriber
United States - 2: scott_wuecnerwal
South Africa - 1: bmerry

Thanks for reading, and check back next week!

Tuesday, September 15, 2015

A week with Gambler's Ruin

Codeforces Round 319 took place on Thursday (problems, results, top 5 on the left, solution analysis). I did not take part, but I heard that the problems were somewhat different in style from the last round - and yet the people in the first two places are the same. Huge congratulations to Marcin_smu and mnbvmar!

TopCoder SRM 667 happened one day later (problems, results, top 5 on the left, my screencast). The round was very tricky, with 250 inviting a lot of incorrect greedy solutions, the 500 requiring a lot of careful mathematical reasoning, and 1000 being a "constructive" problem which was hard to even approach.

W4yneb0t would be a clear winner thanks to his correct solution for the 1000 - if not for the challenge phase, where K.A.D.R found 10 incorrect solutions for the 250 and managed to climb to the first place despite solving only the easiest problem - great job!

Coming up with a good challenge case wasn't easy at all. Here's what the problem was about: you have 20 skills to learn and 50 exercises, each exercise will give you a certain subset of skills (the subset for each exercise is fixed, but can be different for different exercises). If there are k skills you don't have yet among the skills that the exercise gives you, then you spend k2 effort on that exercise. What is the minimum total effort you must spend to learn all skills? Naturally, the greedy solutions just picked an exercise that adds the smallest number of skills at each step. Can you see how to make them fail? I'm also wondering how did people come up with such testcases during the round - was stress-testing the way to go?

The careful mathematical reasoning in the 500 was basically about finding a closed form solution to gambler's ruin. Of course, since this is a well-known task, one could replace the mathematical reasoning with Googling, which I duly did. I didn't know the term "gambler's ruin", though, so you can watch how I gradually converged to the right search terms starting from this moment of the screencast :)

Open Cup 2015-16 Grand Prix of Ukraine has ignited the new season of the Open Cup on Sunday (results, top 5 on the left). Quite a few veteran teams participated, but the first place still went to a team that will participate in ACM ICPC this year - great job, NNSU!

Problem K, while being quite standard, allowed one to practice various dynamic programming techniques from this blog post. It went like this: you are given N numbers, and need to split them into M consecutive groups. The first number in each group is multiplied by 1, the second number in each group is multiplied by 2, and so on, and you need to minimize the sum of those products. And of course, you need a solution that's faster than O(N3).

During the contest, my teammate Ilya got the divide and conquer approach accepted, after the contest I've solved it using something similar to Knuth's optimization in upsolving, and the post-match discussion mentions that convex hull optimization also worked well :)

Finally, let me come back to the 4x4 problem from last week, counting the number of AxB rectangles of 0s and 1s such that each CxD subrectangle contains the same number of 1s. C and D are up to 4, so the subrectangle is very small, but A and B are up to a billion.

The solution is a classical case of gradually simplifying the problem until it becomes tractable. The first step is: let's try to gather some more knowledge from the fact that each CxD subrectangle has the same sum. More specifically, let's shift a CxD subrectangle to the right by 1. Now we can see that two 1xD columns that are C apart horizontally must also have the same sum.

Now let's shift those two 1xD columns down by 1. We can see that the differences between the corners are the same: a[x+C][y+D]-a[x+C][y]=a[x][y+D]-a[x][y]. Note that if this difference is not zero, then we can uniquely determine the values in the corners: for example, if the difference is 1, then a[x+C][y+D]=a[x][y+D]=1, and a[x+C][y]=a[x][y]=0. This, in turn, simply means that: if we have a[x][y] not equal to a[x][y+D], then all values in corresponding rows with step C are equal: a[x][y]=a[x+C][y]=a[x+2*C][y]=...=a[x-C][y]=a[x-2*C][y]=..., and the same for row y+D; a similar statement is true for columns.

At this point we can see that we've learned a lot about relations between cells that differ by a multiple of C horizontally and by a multiple of D vertically. In order to dissect the problem further, let's look at each such subgrid independently: take some offsets 0 <= c < C and 0 <= d < D, and consider all cells with coordinates (c+p*C,d+q*D). The above observation has a very concise meaning for this subgrid: whenever two horizontally adjacent cells in the subgrid have different values, the columns containing them are constant; and whenever two vertically adjacent cells in the subgrid have different values, the rows containing them are constant.

Now it's quite easy to make the next step: we can notice that the two situations (two constant rows and two constant colums) are mutually exclusive: if there's at least one constant row, then there can be no two constant columns with different values, and vice versa. This, in turn, simply means that there are just two possibilities for the entire subgrid: either all rows are constant (we will call such subgrids "horizontal"), or all columns are constant (we will call such subgrids "vertical"). Of course, a subgrid can be both horizonal and vertical at the same time, in case all values are equal.

Having learned the exact structure of each subgrid, it's time to go back to the full grid. It consists of C*D subgrids, each in the form described above. Let's look at the fact that two 1xD subrectangles that are C apart horizontally: they must have the same sum. They include cells from D different subgrids. The corresponding cells in horizontal subgrids are equal, so we can simply disregard them. The corresponding cells in vertical subgrids might not be equal, so we have some constraints on the vertical subgrids now. In a similar manner, looking at Cx1 subrectangles that are D apart vertically helps us get some constraints on the horizontal subgrids. Moreover, it's not hard to see that those constraints are not just necessary, but sufficient: if every two 1xD subrectangles that are C apart horizontally have the same sum, and every two Cx1 subrectangles that are D apart vertically have the same sum, then it's not hard to see that all CxD subrectangles have the same sum, too.

So we have reduced the original problem to making sense of the constraints induced on the horizontal/vertical subgrids. We can note that horizontal and vertical subgrids have independent constraints, so we can analyze them separately. Moreover, the vertical subgrids, for example, have constraints in groups of at most D, and each such group of at most D vertical subgrids is constrained independently, so it suffices just to analyze one of them.

And finally, we can see that the remaining problem is not hard anymore. We have a group of n <= D vertical subgrids, and we need the sum in each column to be the same. If this sum is m, then there are C(n,m) ways to pick one column, and thus C(n,m)t ways to pick all subgrids, where t is the number of columns (which is equal either to A/D or A/D+1, depending on the offset of the subgrid). Summing this over all values of m, we get the total number of ways to fill these n vertical subgrids.

All that's left is to multiply the answers for the independent problems we've solved to get the answer for the problem. Well, almost all :) In the above solution we relied on knowing which subgrids are vertical and which are horizontal. In reality, we don't know that, but we have just C*D <= 16 subgrids, so we can iterate over all 2C*D ways, compute the answer for each case, and add those up. 

This solution, will, however, count some grids twice: when a certain subgrid is constant, it is both vertical and horizontal, so we will count a grid containing such subgrid several times. In order to avoid that, we should remove constant subgrids from one of the sides, let's say from horizontal subgrids, and that will allow us to count each grid exactly once. In order to count the number of ways to fill a group of horizontal-but-not-constant subgrids, we have to use the inclusion-exclusion principle: start with the above computation that allows constant subgrids, subtract the number of ways to fill the grids such that one subgrid is constant, add the number of ways to fill the grids such that two subgrids are constant, and so on.

Finally, all loose ends are tied and we have a solution. Despite the length of the explanation, the code itself ends up being quite short. And the key idea, again, is to continue to simplify the problem in many small steps, and not give up early.

Thanks for reading (has anybody read this far?), and check back next week!

Monday, September 7, 2015

A week with 4x4

TopCoder Open 2015 Round 3A took place on Saturday (problems, results, top 6 on the left). First 6 places in the onsite competition in Indianapolis were handed out to those who could solve either the medium or the hard problem, with another 6 selected in Round 3B that will take place in two weeks. According to the official website, we will have a semifinal round and a final round onsite, but the number of advancers from the semifinal isn't known yet. A similar structure was used in 2009, with 18 onsite participants competing in a single semifinal and 8 qualifying to the final. Scaling that down to 12 onsite participants means we'd have 5 contestants in the final, but of course only TopCoder knows for sure.

The hard problem has left me stunned: how come such simple question has never been studied before, and yet requires a complex but doable solution? The problem simply asked about the number of AxB rectangles of 0s and 1s such that each CxD subrectangle contains the same number of 1s. C and D are up to 4, so the subrectangle is very small, but A and B are up to a billion.

Coming back to a problem I mentioned last week: given a tree and a vertex selected in it, what's the maximum number of different vertices a walk of a given length L starting from the given vertex can visit?

The main idea of the solution is that walks in a tree do not have a lot of freedom. More specifically, if a walk in a tree starts and ends in the same vertex, then it traverses each edge an even number of times - for each step "down" there's a corresponding step "up", which in turn means we traverse each edge either 0 times or at least twice. Because of this, a cyclic walk of length L visits at most L/2 different vertices in addition to the starting/ending vertex. Suppose now that the walk is not cyclic, going from some vertex A to another vertex B. Again, since walks in a tree do not have a lot of freedom, it's not hard to see that this walk can be represented as the direct walk from A to B along the tree, plus some cyclic walks inserted into it, so we get 1 new vertex per edge when walking from A to B, and 0.5 new vertices per edge when walking along the cycles, and the maximum number of different visited vertices is 1+d+(L-d)/2, where d is the distance between A and B. Now it's clear that we should simply pick B to be as far from A as possible, but at most L edges far.

Thanks for reading, and check back next week!