Algorithms Weekly by Petr Mitrichev: November 2016

Sunday, November 20, 2016

Yet another triangle week

The first November week included the fifth stage of the 2016-17 Open Cup, the Grand Prix of Siberia (problems, results, top 5 on the left). Team Havka-papstvo dominated the proceedings, and topped the cake with the only passing solution for the very tedious problem 10 - congratulations!

I'd like to highlight the interactive problem 3, which turned out to be the second most difficult. The judging program has a hidden non-degenerate triangle. The coordinates of its vertices are integers not exceeding 1000 by absolute value. You're allowed to make at most 1000 queries, and your goal is to determine the coordinates of the vertices of the triangle. In each query, you choose a line (more precisely, a half-plane), and you're told the areas of the two parts this line splits the triangle into (one of them can be 0 if it doesn't intersect the triangle). There are two steps to solving this one: figuring out how to do it, and then figuring out how to do it in such a way that the implementation isn't a nightmare :)

In my previous summary, I've described another triangle problem: you are given 200000 points on the plane which are guaranteed to be randomly placed within a bounding box, and need to find three points which form a triangle with the largest area.

The solution is surprisingly straightforward: we find the convex hull of the given points, and then try all triangles with vertices from the convex hull to determine the largest one. It is correct because it's not hard to see that when we fix 2 out of 3 vertices of the triangle, the third one needs to be as far from that line as possible, which means it's a farthest point in some direction, which means it's a vertex of the convex hull. And it is fast enough because it turns out that a set of random points has only O(logn) points on average on its convex hull (see this paper).

Thanks for reading, and check back soon for the next week's summary!

A triangle week

The last week of October was relatively busy. First off, TopCoder SRM 701 took place on Wednesday (problems, results, top 5 on the left, analysis). The top 3 could be a foreshadowing for the upcoming TCO 2016 Semifinal 2, but unfortunately xudyh could not make it to Washington (does anybody know why?). Nevertheless, congratulations on the SRM victory!

Then, AtCoder held its Grand Contest 006 on Saturday (problems, results, top 5 on the left, analysis). apiad has dominated the field by being the only one to solve all problems, and he has also demonstrated an unorthodox strategy by starting from the hardest problems - so he would've been first even if he didn't finish the 3 easier ones!

And finally, Open Cup 2016-17 Eastern Grand Prix took place on Sunday (problems, results, top 5 on the left). SPb ITMO 1 team has jumped into a big lead in the overall standings after this round - congratulations!

Problem B was on the boundary of "oh, beautiful observation" and "oh, nasty trick" :) Which feeling does it leave you with? You are given 200000 points on the plane which are guaranteed to be randomly placed within a bounding box, and need to find three points which form a triangle with the largest area.

Thanks for reading, and check back soon for the November news!

A Canada week

The Oct 17 - Oct 23 week featured just the Canada Cup by Codeforces (problems, results, top 5 on the left, analysis). If not for challenges, just 5 points would've separated eatmore and tzupengwang - congratulations to both! I could not make this round, so I don't have anything to share about its problems.

Check back soon for the hopefully longer summary of the following week :)

Saturday, November 19, 2016

An unshuffle week

During the Oct 10 - Oct 16 week, TopCoder has hit its next milestone: SRM 700 (problems, results, top 5 on the left, analysis). Despite the starting time of 4am, the first three spots were occupied by contestants from St Petersburg and Moscow. The first place went to _aid, who has recently joined the ACM ICPC World Champion team SPbSU Base - so all other teams will not have an easy time at 2017 World Finals :) Congratulations!

In my previous summary, I have mentioned a problem in the trademark style of Ivan Kazmenko: You're given the following random number generator, an implementation of the Fischer-Yates shuffle, and a program using them:

Function random (range):

state := (state * 1664525 + 1013904223) mod 2³²;

return (state * range) div 2³²;

Function shuffle (array):

n := length of array;

for i := 0, 1, 2, ..., n - 1:

j := random (i + 1);

swap(array[i], array[j]);

return array;

state := seed;

n := 10000;

array := [1, 2, ..., n];

array := shuffle (array);

In other words, we shuffle the numbers from 1 to n twice. The particular generator used here is well-known, so no particular weaknesses could be expected except of course the fact that it's a linear congruential generator. The task is: given the final state of the array, find what the seed was.

There are two main ideas necessary to solve this problem. The first idea is: if we know the numbers that the generator returned on two concrete calls (i.e., on p-th step and on q-th step), there are only so many (on the order of 100) possibilities for the seed that we can try them all, and moreover, we can enumerate all of them efficiently.

In order to enumerate them, we can notice that the value of the state variable on p-th step is a linear function of the initial seed. Because of this, having two return values fixed leads to two inequalities that look like l <= seed * a <= r (mod 2³²). We can solve this system of inequalities using an algorithm similar to the Euclidean one.

The second idea is that it's not actually that hard to find two concrete return values of the generator. During each shuffle, each number participates in at least one swap: when the variable i passes through the cell containing it. It's also not hard to see that on average quite a few numbers will participate in exactly one swap. And because this set of numbers is fairly random for each of the two shuffles, there will be a significant amount of numbers that have participated in exactly two swaps overall. And since we know the starting and ending position for each number, we just need to guess which was its "middle" position after the first swap.

So here's the overall solution structure: we iterate over all numbers from n to 1 that have moved to the left after two shuffles. For each of those numbers, we iterate over the "middle" position between the final positions and the starting position. We find all seeds that would make this number do the corresponding jumps, and then check if each of those seeds leads to the overall result we see. We expect to find the seed after trying just a few initial numbers.

Thanks for reading, and wish me luck in today's TopCoder Open semifinal (starting time, broadcast link)!

Monday, November 7, 2016

A Fourier combination week

The Oct 3 - Oct 9 week was calmer. Intel Code Challenge Final Round has gathered the top competitors for 3 hours instead of the usual 2 we see on Codeforces (problems, results, top 5 on the left, analysis). Maybe TooDifficult did not notice this, as he finished all problems within the first 2 hours anyway, and thus won easily :) Congratulations!

On Sunday, Open Cup 2016-17 Grand Prix of SPb has completed the 3-week run (results, top 5 on the left). Reminding us how things usually go in Open Cup contests, Past Glory team have won convincingly, and were also the only team to solve problem I - great job!

Here's what the problem was about. You're given the following random number generator, an implementation of the Fischer-Yates shuffle, and a program using them:

Function random (range):
state := (state * 1664525 + 1013904223) mod 2³²;
return (state * range) div 2³²;

Function shuffle (array):
n := length of array;
for i := 0, 1, 2, ..., n - 1:
j := random (i + 1);
swap(array[i], array[j]);
return array;

state := seed;
n := 10000;
array := [1, 2, ..., n];
array := shuffle (array);
array := shuffle (array);

In other words, we shuffle the numbers from 1 to n twice. The particular generator used here is well-known, so no particular weaknesses could be expected except of course the fact that it's a linear congruential generator. The task is: given the final state of the array, find what the seed was.

In my previous summary, I've mentioned a tricky AtCoder combinatorics problem: You are given a tree with n<=200000 vertices. Now we consider all C(n,k) ways to pick k vertices of the tree, and for each of them we consider the "convex hull" of the k vertices: the smallest part of the tree that connects all of them together. Your goal is to find the sum of the sizes of those convex hulls over all C(n,k) ways. What's more, you need to find n sums: for each value of k between 1 and n. Each sum must be computed modulo 924844033.

Even though the problem statement doesn't mention probabilities and expected values, the solution starts with an observation that is very similar to the linearity of expectation: in order to find the sum of the sizes of the convex hulls, we will find for each vertex the number of convex hulls containing it, and then add up those numbers.

In order to find the latter, let's find the number of convex hulls not containing it, and subtract it from C(n,k). In order for the convex hull to not contain a given vertex, all k chosen vertices must lie completely within one of the subtrees formed by removing this vertex from the tree. So if the sizes of all 2(n-1) subtrees of the original tree are a₁, a₂, ..., a_2(n-1), then we need to find Σ_iC(a_i, k). Finding all a_i is a standard task solved by a traversal of the tree, so we can solve our problem for one value of k in O(n). However, since we have n possible values of k to process, the total running time will be O(n²) which is too slow.

The idea to speed up such computation from O(n²) to O(nlogn) using Fast Fourier Transform is not new, but I don't think I've described it in this blog before. Let's reformulate the problem slightly: let b_i be the number of subtrees of size i. We need to find Σ_iC(i, k)*b_i for each k. Now let's transform our sum:

Σ_iC(i, k)*b_i = Σ_ib_i*i!/k!/(i-k)! = 1/k!*Σ_i(b_i*i!)*(1/(i-k)!)

Now we can notice that the remaining sum is nothing else but multiplication of polynomials, which can be done fast using FFT. To make it completely clear, let's denote u_i=b_i*i!, and v_j=1/(n-j)!, and multiply two polynomials with those coefficients. The (n+k)-th coefficient of the product will be Σ_iu_iv_n+k-i = Σ_i(b_i*i!)*(1/(n-(n+k-i)!) = Σ_i(b_i*i!)*(1/(i-k)!), just as we need.

To conclude, it also seems that this technique can be applied to a wide variety of sums, not just those involving C(n,k). The only property that we need from the function being summed is that it must decompose as a product of functions of n, k, and n-k. However, C(n,k) seems to be the only practically relevant function with this property. Do you have an example of a different such function in mind, or maybe you can even point me to other problems solved using this trick?

Thanks for reading, and check back for more!

Sunday, November 6, 2016

A week after a month

The last week of September was rather busy. The competitions started right on Monday with TopCoder SRM 699 (problems, results, top 5 on the left, analysis, my screencast). At the end of the coding phase, it seemed unusually easy for the recent TopCoder standard, with 8 submissions for the Hard problem. However, the system testing phase showed that even though the general idea of a solution is not hard to see, figuring out all details correctly within the 75-minute round is nearly impossible. matthew99a has thus earned his third SRM victory thanks to super fast solution for the 500, and a solid 100 challenge points to boot. Congratulations!

AtCoder Grand Contest 005 has followed on Saturday (problems, results, top 5 on the left, analysis - scroll down for English). tourist seemed to have started 10 minutes late, so there were some doubts as to who would win initially, but he quashed all of those by finishing all problems within an hour of starting. Amazing job!

The last problem was a great demonstration of the power of combinatorics. You are given a tree with n<=200000 vertices. Now we consider all C(n,k) ways to pick k vertices of the tree, and for each of them we consider the "convex hull" of the k vertices: the smallest part of the tree that connects all of them together. Your goal is to find the sum of the sizes of those convex hulls over all C(n,k) ways. What's more, you need to find n sums: for each value of k between 1 and n. Each sum must be computed modulo 924844033.

Intel Code Challenge Elimination Round started just 15 minutes after the end of AGC (problems, results, top 5 on the left, analysis), so it's super impressive that anta shows in the top 5 of the both events. However, he could not come close to FatalEagle, who has earned his first Codeforces victory thanks to solving all problems and finding 9 successful challenges while doing so. Congratulations!

The second stage of Open Cup 2016-17, Grand Prix of Eurasia, has wrapped up the week (results, top 5 on the left). Problem 1 looked very daunting initially, but as one unraveled it the actual solution turned out pretty and short. You are given an array with n<=100000 numbers, and want to find the segment of that array with the largest sum. That would be too easy, of course, so there's an added twist: you're allowed to do at most k<=10 swaps of two numbers in the array before picking a segment.

In my previous summary, I've mentioned another tricky Open Cup problem: you are given a connected undirected graph with at most 2000 vertices and 2000 edges, where each edge needs to be colored either red or blue. You need to do the coloring using a single alternating walk: you start in some vertex, then keep moving along the edges, coloring the first edge you pass red, the second blue, the third red again, and so on. You are allowed to pass the same vertex and even the same edge twice, and when you pass the same edge multiple times, the last color used stays. Is it possible to obtain the required coloring?

It seems hard to approach this problem because it seems that we can do anything: if we color an edge with the wrong color, we can come back later and use the correct color again, so it's not clear how to see if we're making good progress. The main observation is to look at this process from the end. The last edge we traverse must be colored with its intended color. The edge we traverse before it must also be colored with its intended color, unless it's the same edge. More generally, at each point we have some set of edges we've already traversed, which can now be traversed with any color, and the remaining edges which must be first traversed with their intended color, and our goal is to traverse each edge at least once.

Also, since can always go back and return to any vertex we've already visited with the same parity, the whole concept of a walk is a bit misleading now. We just have a set of (vertex, parity) pairs we can reach, and a set of edges that we've traversed at least once, and are gradually expanding both sets. If two vertices a and b are connected using an untraversed edge with color c, and we have reached (a, c), then we can traverse the edge, and reach (b, 1-c) pair. If the edge is already traversed, then we can also use it to reach (b, c) from (a, 1-c).

To put the solution together, we will maintain a queue of possible moves. When processing a move in the queue, we will add new moves that were made possible by this move to the queue. It's important here to remember that the new moves are of two types: those starting in the new vertex we've reached, and those that use the edge we just traversed with the other color, both in the reverse and in the same direction - many teams have forgotten to account for the latter. And of course we must start from some vertex and some parity - but since the graph is just 2000 in size, we can just iterate over all starting - more precisely, finishing :) - (vertex, parity) pairs.

Thanks for reading, and check back soon for the next week's summary! I promise, "soon" does not mean a month this time :)