Wednesday, May 18, 2016

A 400-500 week

Let's take a quick break from the World Finals practice session and take a look at last week's events. Codeforces Round 352 on Wednesday was one of the last chances to practice before this week's event (problemsresults, top 5 on the left, analysis). Quite fittingly, three out of top five spots were taken by the World Finals competitors, including yet another first place for subscriber – he is on fire!

TopCoder Open 2016 Round 2A took place a day later (problems, results, top 5 on the left, my screencast). The point values of 400, 500 and 1000 (instead of the usual 250, 500, 1000) gave a hint that this will not be a usual round, and it did go strangely indeed (self-inducing prophecy?..) The 1000 was probably the most "typical" problem of the round, so going for it after the 400 has paid off for me. I've also managed to get the 500 submitted and even accepted, but there actually exists a test that makes my solution time out. Can you come up with one? Here's the solution (requires TopCoder login). Bonus points if your test makes it time out even after shuffling the vertices – I don't know how to do that :)

Here's the problem statement of the hard problem that saved my day: you have 40000 tokens which you can spend using 15 vending machines. Each machine takes tokens, and sometimes gives a prize in response (but sometimes not). The probability of getting a prize after inserting K tokens and not getting a prize on the first K-1 attempts is min(1, K2/Li2), where Li is a constant, different for different vending machines. After you get a prize from a machine, its state resets to the initial one (so K in the above formula becomes the number of tokens inserted after that). Your strategy must be to pick some vending machine, keep inserting tokens to it until you get a prize, then pick some different vending machine, give tokens until a prize there, then pick some different vending machine again (this one might be the same as some previous machine used, but not as the vending machine used to get the last prize), and so on until you run out of tokens. The prizes from i-th vending machines have value Vi. How to maximize the total expected value of all prizes you get?

Thanks for reading, and keep checking my twitter for ICPC 2016 updates!

Tuesday, May 17, 2016

A Polish week

TopCoder SRM 690 took place in the very early hours of Wednesday, May 4th (problems, results, top 5 on the left). Snuke has managed to score 50 additional points during the challenge phase, and that’s what allowed him to jump from third to first place – congratulations on the first SRM victory!

VK Cup 2016 Round 3 on Saturday selected 20 lucky teams to advance to the onsite finals in St Petersburg (problems, results, top 5 on the left, online mirror results, analysis). The rating favorite team “Never Lucky” bounced back from the relatively weak 4th place showing in Round 2 to win this round by over 1000 points – great job! Nevertheless, there are many other strong teams in the top 20 who will make sure the St Petersburg final round isn’t easy for subscriber and tourist.

Google Code Jam Round 1C selected the final 1000 participants for Round 2 (problems, results, top 5 on the left, analysis). Among the top scorers were the reigning champion Gennady.Korotkevich and linguo who doesn’t participate in competitions other than Google Code Jam but nevertheless does very well each year – but they couldn’t take the first place from artberryx – congratulations on the win!

Finally, Russian Code Cup 2016 Qualification Round 1 wrapped up the tournament-heavy weekend late on Sunday (problems, results, top 5 on the left, analysis). Subscriber pulled out another victory just a few days before the ACM ICPC World Finals, suggesting that the St Petersburg ITMO team is still very well in the running for the top spots there – way to go!

The last problem E required one to combine two relatively standard but normally unrelated algorithms into one solution: given two trees with at most 50 vertices in each one, find the size of the largest connected subtree in the first tree that has an isomorphic connected subtree in the second tree.

In my last summary, I've mentioned a problem that I couldn't solve: you are given two strings of the same length consisting of lowercase English letters, and want to transform the first string into the second one. You are allowed operations of two kinds: changing one character into another takes 1 second, and changing all occurrences of some letter into some other letter (the same for all occurrences) takes C seconds. What is the fastest way to do the required transformation?

Tomek Jarosik has posted a link to analysis of the original contest of this task in comments. Here's my rough understanding.

First of all, it's not hard to see that we can do all group changes before all individual changes – if we're going to waste 1 second for a given character, we can do it in the end and change it directly to what we need. Because of this, the task can be reformulated: now we're only allowed to do group changes, each costing C, and need to minimize total cost of group changes plus the total number of positions in which the resulting string differs from the goal.

Now let's draw a graph where the vertices are the 26 letters. The edges, somewhat counter-intuitively, will not be the group changes we make – instead, let's draw an edge from each letter to the letter that all its occurrences will change to after all group changes. For example, if we change all As to Bs, then all Cs to As, and then all Bs to Cs, then our graph will have edges A->C, B->C, C->A. Each letter will have 0 or 1 outgoing edge.

Given such graph, it's easy to count the total number of positions in which the resulting string differs from the goal: since we know how each letter ends up, we know the resulting string, and can simply count the differences. The other part of the value we minimize is not so clear: what is the minimum number of group changes required to construct the given graph?

The lower bound on the number of group changes is the total number of edges in this graph. Moreover, in most cases this lower bound is actually the answer. To see that, let's consider the structure of our graph. Since each vertex has 0 our 1 outgoing edge, the connected component of this graph are either directed trees, or a cycle with a directed tree growing into some of its nodes.

In order to find the group operations that construct the given directed tree, we can start from the root (the "sink"), apply the operations for all letters that need to be changed into the root letter, then continue with the letters that need to be changed to those letters, and so on.

When we have a cycle, we can't do the same. In fact, when a connected component of the graph forms a cycle of length K, we can't really implement the changes using just K group operations – we need K+1 in that case. To see that, notice that the first operation we make can not transform one of the letters in the cycle into another, as that would "merge" two letters together that need to be separate in the end. Instead, we must use an auxiliary letter that's not in the cycle. For example, if we need to build A->B->C->D->A, then we need to do something like: A to Z, then D to A, then C to D, then B to C, then Z to B.

However, when a connected component is not just a cycle – in other words has one or more directed trees going into the cycle's vertices – then we don't need an extra group operation for this cycle. For example, suppose the above A->B->C->D->A cycle also had a E->C incoming edge. Then we can do: B to E, then A to B, then D to A, then C to D, then E to C, handling both the cycle of length 4 and an extra incoming edge in 5 total group operations, and achieving the lower bound. If there's more than one edge going into the cycle, we can handle the rest as separate directed trees after handling the cycle.

There's one more constraint that we need to take care of: the standalone cycle resolution mentioned above needs an auxiliary letter (denoted as Z in the example above). In case there's a letter with an outgoing edge but without incoming edges, it can perform the role of that auxiliary letter: first, handle the component containing it, and then use it as auxiliary letter for all standalone cycles. If, however, there's no such letter – in other words, if the entire graph consists of standalone cycles and isolated vertices, then it's not clear what to do. After some more thinking we can notice in case such graph has at least one standalone cycle (in other words, at least one edge), then it's impossible to construct it at all, since it implements a non-identity permutation on the letters, and our very first group operation will merge two letters together.

Let's summarize what we have reduced our problem to. We need to build a graph on the 26 letters as vertices, with 0 or 1 outgoing edge for every vertex. If we have an edge from letter A to letter B, then its cost is the number of mismatches that yields (number of positions where the first string has A and the second string has not B) plus C (for the group operation corresponding to this edge). If we decide not to have an edge from letter A, this also has a cost (number of positions where the first string has A and the second string has not A). In addition, we get an extra cost of C for any standalone cycle in our graph, and the graph must not be all standalone cycles and isolated vertices, but it can be all isolated vertices. We need to minimize the total cost.

Our team got this far at the actual contest, but we couldn't come up with a way to solve this reduced problem. It turns out that solving it relies one one key insight: first, let's build this graph greedily: for each letter, pick the outgoing edge (or lack thereof) that minimizes the contribution to the total cost (taking account the cost C of having the edge, but not the additional bonus for a standalone cycle or the not-all-standalone-cycles restriction). This will give some graph G0. If this graph does not have standalone cycles, then we're done. And in case it does, it turns out that those are the only cycles that we need to take care of!

More precisely, we can relax the problem as follows without changing the answer: instead of getting an extra cost of C for any standalone cycle, we will only get an extra cost of C for any standalone cycle that is present in G0. Similarly, instead of forbidding the graph to consist of only standalone cycles and isolated vertices, we will only forbid the graph to be identical to G0, and only if the latter consists of only standalone cycles and isolated vertices.

To see why the answer doesn't change, consider some optimal solution for the relaxed problem. Suppose it has some unexpected standalone cycle that is not present in G0. At least one vertex on this cycle has a different outgoing edge in G0. Let's change that edge to the one from G0. The total cost will not increase since G0 is locally optimal, and at the same time the standalone cycle will be destroyed, and no new standalone cycles will form (since it's impossible to do that with just one edge change). By continuing this process, we can show that there exists an optimal solution for the relaxed problem that is also a valid solution for the original problem.

Finally, we can solve the relaxed problem using dynamic programming: we process all letters in order, and pick the outgoing edge (or lack thereof) for each of them, while remembering which cycles from G0 we have already broken, and whether our graph is still identical to G0. Since Ghas at most N/2 cycles, the running time is O(2N/2*N2) which is very small for N=26.

Thanks for reading, and check back soon for the last week's summary. Also make sure to check my twitter for the ACM ICPC 2016 World Finals updates!

Thursday, May 5, 2016

A 26x26 week

First of all, let's come back to the Grand Prix of Southern Caucasus which rounded up the 2015-16 season of the Open Cup on the week before the last one but had results published only last week (results, top 5 on the left). Team Havka-papstvo ended the season on the highest possible note, solving all problems except the most difficult one, and doing it almost without bugs — great job!

I still have no idea how to solve the aforementioned most difficult problem B — maybe the readers of this blog can help? The problem statement is extremely simple: you are given two strings of the same length consisting of lowercase English letters, and want to transform the first string into the second one. You are allowed operations of two kinds: changing one character into another takes 1 second, and changing all occurrences of some letter into some other letter (the same for all occurrences) takes C seconds. What is the fastest way to do the required transformation?

The strings have at most one million characters, but that's not really important since what matters is how many times each of 26x26 (old letter, new letter) pairs appears.

Since this was the last round of the Open Cup season, the final standings of the cup are now finalized (results, top 5 on the left). Unlike last year, most (7 out of 10) top finishers are coming to the ACM ICPC World Finals in Phuket, which looks to be very exciting!

TopCoder Open 2016 Round 1C on Wednesday presented the last opportunity to qualify for Round 2 and stay in contention for the finals in Washington, DC (problems, results, top 5 on the left). Four top competitors on the scoreboard have solved all three problems, but krijgertje was much faster in doing so — congratulations!

Codeforces Round 349 offered an opportunity to start the weekend competitively on Friday night (problems, results, top 5 on the left, analysis). Problem D was a nice exercise about implementing a problem with many possible cases in such a way that (almost) no case analysis is present in the solution. It went like this: you are given four points on the plane. You can shift each point either horizontally or vertically, but not in both directions. Different points can be moved in different directions — for example, three points can be moved vertically and the fourth one horizontally. Your goal is to get the points to become the corners of an axis-parallel square, while minimizing the maximum distance traveled by one of the points.

Finally, Google Code Jam 2016 Round 1B on Saturday allowed those who failed a problem in Round 1A a second chance (problems, results, top 5 on the left, analysis). ikatanic shared the first place with rng..58, but he has also appeared in the top 5 in another contest this week, so he wins by tie-breaking :) Congratulations to both!

In my previous summary, I've mentioned the 0.636 problem: you are given 2n points on the plane, no three points on the same line, and a matching between them – n segments connecting the points to each other in such a way that each point is the endpoint of exactly one segment. Your goal is to find another matching with an additional property: the segments must not intersect. That would be a classical problem, but here there’s an extra constraint that changes the problem completely: the total length of the segments in your matching must be at least 0.636 (zero point six three six) times the total length of the segments in the given matching. n is at most 100.

The first step in solving this problem is figuring out where does the strange number 0.636 come from. It's funny that just googling that constant yields the answer: this number is slightly less than 2/π, and that, in turn, is the expected length of a projection of a segment of length 1 onto a random line.

This leads us to the first step of the solution: if we pick a random line, and project our matching onto that line, then the expected total length of the projection will be 2/π times the total length of the original matching. Because of this, we can try several random lines and quickly find a line L with projection that's at least 0.636 times the total length of the original matching.

How does this help? Our goal now will be to find a non-intersecting matching with total length greater than or equal to the total length of this projection. Since the total length of a matching is greater than or equal to the total length of its projection, it is enough to find a non-intersecting matching with a projection to the same line L that it greater than or equal in length to the projection of the original matching. And for that, in turn, it is enough to find a non-intersecting matching with a projection to the same line L that has the maximum possible total length.

Which matchings between 2n points on a line have maximum possible total length? This is a reasonably easy question: it's those matchings where the left end of each segment is among the n leftmost points on the line, and the right end of each segment is among the n rightmost points on the line. Moreover, any such matching has this largest possible total length!

Which reduces our problem to the following: we have 2n points on the plane, n of them colored blue (the ones with n leftmost projections on L) and the remaining n colored red, and need to find any non-intersecting matching connecting those points, which is a well-known task with many possible solutions. One solution starts with any matching and then "flipping" intersecting segments until there are none (this will end in finite time since the total length constantly decreases); another takes advantage of the fact that the groups of n blue and n red points are separated from each other in our case, and thus we can find a common tangent to their convex hulls, pair the corresponding red and blue points, and repeat.

Thanks for reading, and check back next week!