Monday, December 11, 2017

A global shift week

The Sep 18 - Sep 24 week featured the second Open Cup stage: the Grand Prix of Ural (problems, results, top 5 on the left). The Seoul National U 2 team continued to amaze, this time claiming the victory by 1 problem to Past Glory and by 2 problems to everybody else. Very well done!

Later on Sunday, Codeforces hosted a contest titled "Manthan, Codefest 17" (problems, results, top 5 on the left, analysis). anta has stumbled on the tricky problem D (less than a third of contestants who submitted got it accepted), but has still came out clearly on top thanks to being the only contestant to submit all problems. Congratulations on the victory!

In my previous summary, I have mentioned a couple of problems. One was with a recursive background: there are many contestants taking part in a competition, and top c will be awarded with t-shirts. There are n available t-shirt sizes (n<=200000). Each contestant has indicated one of two things: either they need a t-shirt with some concrete size, or they need a t-shirt with one of the two concrete adjacent sizes. More precisely, for each size i we know the number ai of contestants (up to 108) that need a t-shirt of size i, and the number bi of contestants (also up to 108) that need either a t-shirt of size i or a t-shirt of size i+1. What is the minimal number of t-shirts we need to buy in order to be able to give t-shirts to top c contestants, no matter which ones end up in top c?

First, we can notice that we need to buy such set of t-shirts that in the resulting (huge) graph there's a full matching for any set of c nodes in the left part. Hall's theorem gives us an equivalent formulation: we need to make sure that for any set of sizes for which we buy x t-shirts in total, and x is less than c, the number of people who can only wear a t-shirt from this set must not exceed x.

Then, we can notice that we can only consider segments of sizes, and not arbitrary sets, in the above condition, as when we have multiple disjoint segments that violate the constraint together, one of them will also provide a constraint violation.

Then, just like in many other problems with constraints on segments, we can buy the t-shirts greedily: going in increasing order of sizes, for each size, we will buy just enough t-shirts to make sure the constraint is not violated for all segments that ends with this size. It doesn't make sense to buy more of this size as we can replace those extra t-shirts with ones of size one higher without making things worse.

Formally speaking, si=max(min(c,ai), min(c,ai+bi-1+ai-1)-si-1, min(c,ai+bi-1+ai-1+bi-2+ai-2)-si-2-si-1, ...). As we move from i to i+1, the arguments to max() change in the following way: the subtracted part increases by si for all terms, the a+b part increases by ai+1+bi, and we get a new term. The latter causes some terms to jump over c, in which case the corresponding min() will forever stay equal to c, and these terms will be zero or negative in all subsequent steps, so we can forget about them.

This leads to the following approach: let's keep a set of terms for which the min() is still less than c, then we only need to be able to add a constant to the positive part of all terms, add a constant to the negative part of all terms, to find all terms where the positive part exceeds c, and to find the term with the largest difference between positive and negative parts. Such set of operations can be most straightforwardly implemented by keeping two priority queues of those terms: ordered by positive part and ordered by difference, and to implement adding a constant by keeping an extra "global shift" variable.

This yields a O(nlogn) solution which is fast enough, but it can also be made O(n) as described in the official analysis.

Another problem I mentioned was: two players have a sequence of 50000 integers, and alternately take numbers from either end of the sequence. When all numbers have been taken, each player computes the bitwise xor of their numbers, and the one with a higher bitwise xor wins. Who will win?

I have described our solution in this comment, but I still feel a bit uneasy about it. It can probably be proven by induction, but it would seem that in such simple setting there should be a simple argument that proves it. I'm really curious to see such argument, please share if you know one!

Finally, I've just remembered an interesting bit about the TopCoder problem which I explained in the previous summary, about the attractions, mornings, evenings and constraints. During the contest, I was unable to come up with a max flow solution even though I felt like one is possible, and out of desperation started to implement backtracking that greedily does things that are obviously good to do, branches when we don't have an obvious next step, and cuts branches that are definitely worse than the current best answer. To my surprise, I've received a wrong answer instead of time limit exceeded on the system test. The reason for that was that I assumed that I've taken a transitive closure of the set of constraints when implementing the solution, and forgot to actually take the closure. After making this fix, the backtracking passed the system test in the practice room in 42ms :)

Thanks for reading, and check back later for more September stories!

A chain cut week

The Sep 11 - Sep 17 week had its contests on the weekend. On Saturday MemSQL Start[c]UP has returned to Codeforces after a three-year absence with Round 1 of its third edition (problems, results, top 5 on the left, my screencast, analysis). moejy0viiiiiv was 25 minutes faster than everybody else to solve all problems, and made sure to protect his lead with a few challenges. Congratulations on the win!

The round had a few nice problems, but let me highlight problem F: there are many contestants taking part in a competition, and top c will be awarded with t-shirts. There are n available t-shirt sizes (n<=200000). Each contestant has indicated one of two things: either they need a t-shirt with some concrete size, or they need a t-shirt with one of the two concrete adjacent sizes. More precisely, for each size i we know the number ai of contestants (up to 108) that need a t-shirt of size i, and the number bi of contestants (also up to 108) that need either a t-shirt of size i or a t-shirt of size i+1. What is the minimal number of t-shirts we need to buy in order to be able to give t-shirts to top c contestants, no matter which ones end up in top c?

Apart from being an interesting problem in general, it's a great example for this recent discussion about stories in problem statements. I think that here the t-shirt and contestant story actually makes the problem statement really easy to understand and think about, while a completely formal mathematical statement would be hard to parse.

On Sunday the new season of the Open Cup started with the Grand Prix of Romania (problems, results, top 5 on the left). Team Past Glory did not show any signs of slowing down and won convincingly by a problem — well done! The great performance and second place of the Seoul National U 2 team was a great surprise, and even more so given that this is a different team from the one that earned gold medals at the last ICPC World Finals for Seoul.

Problem L of this contest showed that we can still find new games with non-trivial solutions in a very simple setup: two players have a sequence of 50000 integers, and alternately take numbers from either end of the sequence. When all numbers have been taken, each player computes the bitwise xor of their numbers, and the one with a higher bitwise xor wins. Who will win?

And right after the end of the Grand Prix, Codeforces held its Round 434 (problems, results, top 5 on the left, analysis). The battle for the first place was quite tight. Congratulations to kmjp who came out on top by a mere 25 points!

In my previous summary, I have mentioned a hard TopCoder problem: you are given 50 attractions in a city. You will visit exactly one attraction per day. Each attraction can be visited either in the morning, costing ai, or in the evening, costing bi. Every time you visit an attraction in the morning after visiting another attraction the previous evening, you pay an extra cost c. Every time you switch from evening to morning, you pay an extra cost d. Additionally, you are given a directed graph of restrictions — a set pairs of attractions that constrain the order of visits: in each pair, the first attraction must be visited before the second one. What is the minimal cost to perform all visits?

First of all, let's split our visits into groups: a group of morning visits, then a group of evening visits, then a group of morning visits, and so on (we need to consider two cases based on whether the very first visit is in the morning or in the evening). Since we only pay extra costs when we switch groups, the total cost depends on the number of groups and on the assignment of visits to groups, but not on the order of visits within each group.

Now let's try to build a network in which a cost of a cut would correspond to the total cost of the visits, so that we can solve our problem with the minimum cut algorithm. For each attraction, let us build a chain of vertices, with all chains having the source and sink as start and end respectively. A chain for one attraction looks like: s->v1->v2->..->vn->t. Let's add infinite capacity edges going backwards in the chain, to guarantee that any finite cut separates some prefix of this chain from the rest. The size of this prefix will correspond to the number of the group that this attraction belongs to, and thus the capacity of the edges should be alternating ai and bi: if the vertex belongs to an even-numbered group, the corresponding edge will contribute ai to the total capacity of the cut, and otherwise bi, just as we need.

In order incorporate a restriction "p must be visited before q" into our network, we will add infinite capacity edges from the vertices of the chain for p to the corresponding vertices of the chain for q. This guarantees that with any finite cut, the position we cut the chain for p will be the same or earlier than the position we cut the chain for q.

Finally, in order to incorporate the morning/evening switching costs, let us add an extra attraction that must be visited after all others, and set up the capacities on its chain not as alternating ai and bi, but rather as 0, c, c+d, 2c+d, 2c+2d, ...

The finite cuts in the resulting network correspond to valid ways of visiting the attractions, and the capacity of the cut is equal to the total visit cost, so now we just need to find the minimum cut. Also note that we have built our network using two relatively standard building blocks: infinite edges can give us constraints on the cut, and a chain of vertices with infinite backwards edges implements a choice from several alternatives with specific costs.

Thanks for reading, and check back soon!

Sunday, December 10, 2017

An almost there week

The Sep 4 - Sep 10 week started with two Codeforces rounds. On Monday, Codeforces Round 432 took place (problems, results, top 5 on the left, analysis). AngryBacon has started with the hardest problem F and solved it very fast, which gave them a clear path to victory. Congratulations!

On Wednesday, Codeforces hosted the next Round 433 (problems, results, top 5 on the left, analysis). In a round with 4 problems solvable within an hour and the fifth one too hard to crack, Um_nik has combined excellent speed with 3 successful challenges for a clear first place. Well done!

On the weekend TopCoder hosted not one but two rounds that finalized the list of finalists for its flagship tournament TopCoder Open 2017. Round 3B took place on Saturday (problems — unavailable now, but hopefully the TopCoder website will be fixed soon, results, top 5 on the left). In order to advance one simply needed to solve either the first two (like qwerty787788, Um_nik and ikatanic did), or the third problem (like rng_58 and RRx did). Congratulations to all advancers!

With 1.5 minutes to go in the challenge phase, I had 344.14 points which would be enough to advance with just the easy problem solved, only to make an incorrect challenge that brought me down to 319.14 which was not enough. Of course I didn't know that 344.14 would be enough during the round itself, but seeing the final scoreboard made me quite annoyed at myself for that -25 :)

Here is the easy problem that brought the challenge phase excitement about: you are given two strings of the same length, which is at most 50. You are allowed to swap corresponding characters in the strings (i-th character of the first string with the i-th character of the second string) as many times as you want. Your goal is to obtain two strings with as large as possible longest common substring. Can you see the right approach?


In between the TopCoder rounds, Russian Code Cup 2017 Final Round saw the top contestants compete for glory and prizes (problems, results, top 5 on the left, my screencast, analysis). xudyh was about half an hour faster than everybody else on the first 5 problems. I have tried to leapfrog him by trying to squeeze in a suboptimal solution for the last problem that is fast to code, but it did not pass and xudyh has maintained his first place — congratulations on the win!

I have found the easiest problem of this round to be the cutest. You are given a set 100 distinct integers ai each up to a million, and need to find any set of 100 integers bj also each up to a million, such that all 1002 pairwise sums ai+bj are distinct. What is the nicest way to achieve that?

And finally, the Wildcard Round of TCO 2017 determined the last two finalists on Sunday (problems — unavailable now, but hopefully the TopCoder website will be fixed soon, results, top 5 on the left, my screencast). Only kuniavski and xudyh have solved the hard problem, so it is only fitting that they qualified for the TCO onsite — great job!

I have once again tried to compensate the poor coding phase performance with lots of successful challenges, but was unable to find enough and got eliminated from the TCO. The hard problem that I couldn't crack stated: you are given 50 attractions in a city. You will visit exactly one attraction per day. Each attraction can be visited either in the morning, costing ai, or in the evening, costing bi. Every time you visit an attraction in the morning after visiting another attraction the previous evening, you pay an extra cost c. Every time you switch from evening to morning, you pay an extra cost d. Additionally, you are given a directed graph of restrictions — a set pairs of attractions that constrain the order of visits: in each pair, the first attraction must be visited before the second one. Can you see the way to use max flow to find the minimal cost to perform all visits?

Thanks for reading, and check back tomorrow for more backfills!

A Chihuahua week

Codeforces hosted its Round 431 during the Aug 28 - Sep 3 week (problems, results, top 5 on the left, analysis). dotorya has managed to solve all four solvable problems in just over one hour, more than 20 minutes faster than the second-placed Reyna and more than 50 minutes faster than everybody else. Well done!

Also during this week, the biannual Petrozavodsk Training Camp for teams taking part in ICPC has completed (results, top 5 on the left). With 9 5-hour contests packed into just 11 days, and many world class teams participating, this is the first detailed comparison of top student teams this season. Congratulations to the Moscow SU team Chihuahua for topping the overall standings with quite a margin!

Thanks for reading, and check back for more.

A yosupo week

Miss me?

The Aug 21 - Aug 27 week started of with TopCoder SRM 720 on Thursday (problems — unavailable now, but hopefully the TopCoder website will be fixed soon, results, top 5 on the left). There was a tense challenge phase battle at the top, but yosupo has made sure it was only for the second place by solving the 1000-pointer — congratulations on the victory!

The said 1000-pointer was concerned with somewhat theoretical isotonic regression in L1 metric. On the outside, it looked like this: you are given an undirected weighted graph with at most 50 vertices and at most 1000 edges, and two spanning trees in it. You have to modify the weights of the edges in such a way that the first given spanning tree is minimal, and the second given spanning tree is maximal, while minimizing the total change of weights.

A few solutions for this problem are described and linked in this comment thread.

Later on the same day we had AIM Tech Round 4 on Codeforces (problems, results, top 5 on the left, analysis). yosupo has continued his great day, this time winning without an extra problem but thanks to solving everything fast enough. Well done!

Finally, Saturday presented us with AtCoder Grand Contest 019 (problems, results, top 5 on the left, my screencastanalysis). This time yosupo was only 6th, and the victory instead went to Um_nik who has dominated the round by finishing all problems with half an hour to spare. Congratulations!

In my previous summary, I have mentioned a Codeforces problem: you are given an array of 300000 numbers, and 300000 queries of the following form: a segment [l;r] and a number k, for which you need to find the smallest number that appears on at least 1/k of all positions in the given segment of the array. Note that k is at most 5, so the number must be quite frequent within the segment.

The first idea is to use Mo's algorithm to handle the segments; we'll keep count of how many of each number are there in the current segment in a simple array, and thus can handle the counting for all segments in O(n*sqrt(n)). In order to find the answer for each segment, we'll need to find the counts that are above 1/5 of their sum. One way to do that would be to maintain a priority queue, but that adds an extra log factor both for queries and for the Mo's part, and some coding complexity.

A nicer way to find the high counts is to use randomization: let's just pick 120 random numbers within our segment, and hope that all numbers that occupy at least 1/5 of the segment will be found. Indeed, the probability of that not occurring for a number is less than (1-1/5)120 , or about 2.3e-12. We have at most 5 interesting numbers, 300000 queries, and around 100 testcases, so the overall probability of an incorrect answer does not exceed 5*100*300000*2.3e-12=3.45e-4, which is good enough.

Thanks for reading, and check back soon as I crawl towards the present :)

Tuesday, October 24, 2017

Commentary stream for TopCoder Open 2017 Finals

TopCoder Open 2017 final round is today at 19:30 CEST. Tune in at https://youtu.be/qrlDoP3JQkI to hear me, pashka and Endagorion discuss! This time we should have problem statements at the beginning of the round :)

Monday, October 23, 2017

Commentary stream for TopCoder Open 2017 Semifinal 2 with Egor

We'll try to do live commentary of TopCoder Open 2017 Semifinal 2 with Egor on https://youtu.be/AqpjM86iZlc  - tune in at 19:30 CEST!

Sunday, October 8, 2017

A randomized week

The Aug 14 - Aug 20 week contained the usual suspects: a TopCoder SRM and a Codeforces round. First off, TopCoder SRM 719 took place in the early Tuesday hours (problems, results, top 5 on the left, analysis). yanQval was a lot faster than everybody else and won the round - well done!

Codeforces Round 429 followed on Friday (problems, results, top 5 on the left, analysis). anta and LHiC have both solved their final fifth problem with only a few minutes to spare and thus obtained quite a margin at the top of the scoreboard - way to go!

Problem D in this round allowed for a cute randomized solution that in my view is a bit easier than the one from the official analysis. The problem was: you are given an array of 300000 numbers, and 300000 queries of the following form: a segment [l;r] and a number k, for which you need to find the smallest number that appears on at least 1/k of all positions in the given segment of the array. Note that k is at most 5, so the number must be quite frequent within the segment. Can you see how to use that fact to implement a simple randomized solution?

And finally, I've checked out CodeChef August 2017 Cook-Off on Sunday (problems, results, top 5 on the left, my screencast). EgorK was at the top of the standings with 4 problems for quite some time, but gennady.korotkevich has then submitted his fourth and fifth problems in rapid succession and won the competition. Congratulations!

In my previous summary, I have mentioned a problem that I have set for Google Code Jam 2017 finals: you are given a number k between 3 and 10000. Output any simple undirected graph with at most 22 vertices that has exactly k spanning trees.

The approach that works the most frequently in such problems is to learn several transitions that help cover all required numbers. For example, if we could come up with a way to transition from a graph with k spanning trees to graphs with 2k and 2k+1 spanning trees by adding one vertex, the problem would be solved. Alas, this does not seem possible, as it's not clear how to achieve that extra "+1" spanning tree.

As explicit construction turns out to be hard to impossible, we have to turn to more experimental approaches. One thing that stands out is that there's an enormous amount of graphs on 22 vertices, and while they have at most 2220 spanning trees, many have under 10000 spanning trees, and thus we can hope that for each k in the given range there are many possible answers. So the next idea is to try generating random graphs and checking how many spanning trees those graphs have using the matrix tree theorem, hoping to find all numbers between 3 and 10000 at least once.

It turns out that some amounts of trees are more difficult to achieve than others, so this approach by itself is not sufficient to generate all answers within reasonable time. However, there are numerous different ideas that help it find all required numbers faster, and thus help get the problem accepted. I won't be surprised if everybody who got this problem accepted at the Code Jam has used a different approach :) My experimentation led me down the following path:

  • Since some numbers are hard to get, we try to steer the random graphs towards those numbers. Suppose we have a "goal" number we're trying to achieve. The easiest way to do that is to avoid recreating the graph from scratch, but instead take the graph from the previous attempt, and check if it has less trees than the goal, or more. In the former case, we add a random edge to it, and otherwise remove one. As a result, we're more likely to hit the goal, and as soon as we do it, we pick another goal from the yet unseen numbers and repeat the process.
  • This process has an unfortunate bad state: when we remove an edge, we can make the graph disconnected, thus having 0 spanning trees. At this point we start adding random edges, but if the graph stays disconnected, it keeps having 0 spanning trees. Finally, we add an edge that connects the graph back, but since we have added so many other edges in the meantime, suddenly the graph has a lot of spanning trees, way more than we need. So we need to remove a lot of edges to get back to reasonable numbers, and may well jump back to 0 in the process. Having noticed this happening, I've modified the process to never remove an edge that would make the graph disconnected.
  • Finally, we get almost all numbers we need! After a minute or so, my solution has generated all numbers between 3 and 10000 except 13 and 22 (if I remember correctly). The final idea is that small numbers can still be generated by an explicit construction: a cycle with k vertices has k spanning trees. Since we're allowed at most 22 vertices, 13 and 22 can be done using a cycle.
That's all for my solution, but I have one more question for you: why is 22 so hard to get? In particular, does there exist a simple undirected graph with at most 21 vertices that has 22 spanning trees? I strongly suspect that the answer is no, but don't know for sure.

Thanks for reading, and check back later!

Wednesday, September 27, 2017

A week of 22

The Aug 7 - Aug 13 week was centered around Google Code Jam final rounds. First off, the 21 finalists competed in the Distributed Code Jam final round on Thursday (problems, results, top 5 on the left, analysis). ecnerwala has snatched the first place with just 3 minutes left in the contest on his 6th attempt on D-small - here's for perseverance :) Congratulations!

One day later, the traditional Google Code Jam final round saw its 26 finalists compete for the first place (problems, results, top 5 on the left, analysis, commentary stream). Gennady.Korotkevich has won the Code Jam for the fourth year in a row - well done! He has given everybody else a chance this time by solving only two large inputs, but his competitors could not catch up for various reasons. SnapDragon was leading going into the system test phase, but he knew himself that his solution for D-large was incorrect as it was not precise enough. Second-placed zemen could have claimed the victory had he solved C instead of E-small or F-small or both in the last hour.

I have set the said problem C for this round, which went like this: you are given a number k between 3 and 10000. Output any simple undirected graph with at most 22 vertices that has exactly k spanning trees. I don't think it is solvable just in one's head, but if you have some time to experiment on a computer, I think it's an interesting problem to try!

In my previous summary, I have mentioned a TopCoder problem: given a tree with n<=250 vertices, you need solve a system of at most 250 equations on this tree with m<=250 variables x1, x2, ..., xm , where each variable must correspond to some vertex in the tree. Several variables can correspond to the same vertex. Each equation has the following form: in the path from vertex xai to vertex xbi the vertex closest to vertex pi must be the vertex qi.

If we look at the tree as rooted at vertex pi, the corresponding equation means that both xai and xbi must be in the subtree of the vertex qi, and there must be no child of qi such that both xai and xbi are in the subtree of this child. At this point one had to strongly suspect this problem will reduce to a 2-SAT instance with boolean variables corresponding to "this variable is in this subtree". The equations are already in 2-SAT form on these boolean variables as described above, but the tricky part is to come up with additional clauses that guarantee that the values of boolean variables are consistent and we can pick one vertex for each variable.

First difficulty is that the tree is rooted differently for each equation. However, we can notice that each subtree of any re-rooting of the given tree is either a subtree of this tree when rooted at the first vertex, or a complement of such subtree. So if we have boolean variables for "xi is in the subtree of vertex y when rooted at vertex 1", then the complementary subtrees can be expressed simply as negations of those variables, as each xi must go to some vertex.

Second difficulty is ensuring that when we look at all subtrees which contain a given xi according to our boolean variables, they have exactly one vertex in their intersection that we can assign xi to. It turns out that this can be achieved using simple clauses "if xi is in the subtree of a vertex, it's also in the subtree of its parent, and not in subtrees of its siblings", which are handily in 2-SAT form. We can then find the deepest vertex containing xi in its subtree according to the values of the boolean variables, and it's not hard to see that all clauses will be consistent when xi is placed in that vertex.

Thanks for reading, and check back later for more competitive programming news!

Sunday, September 24, 2017

A 2-SAT week

The contests of July 31 - Aug 6 week started with the second competition day of IOI 2017 on Tuesday (problems, overall results, top 5 on the left). As expected, only the three highest scorers of day 1 had a real chance for the first place, and Yuta Takaya has grabbed this chance with the amazing score of 300 out of 300. Well done!

On Saturday, my fruitless attempts to qualify for the TopCoder Open onsite in Buffalo started with TopCoder Open 2017 Round 3A (problems, results, top 5 on the left, analysis). tourist has claimed the first place thanks to solving both the easy and the hard problem - congratulations - but qualification was also possible with easy+challenges, as Scott and Igor have demonstrated.

I could not figure out the solution for the hard problem in time, even though I suspected it would involve a reduction to 2-SAT from the very beginning. Can you see the way?

You were given a tree with n<=250 vertices and need solve a system of at most 250 equations on this tree with m<=250 variables x1, x2, ..., xm , where each variable must correspond to some vertex in the tree. Several variables can correspond to the same vertex. Each equation has the following form: in the path from vertex xai to vertex xbi the vertex closest to vertex pi must be the vertex qi.

Somewhat orthogonally to this particular problem, I felt like my general skill of reduction to 2-SAT could use some improvement. It seems that there are some common tricks that allow to replace complex conditions with just "or"s of two variables. For example, suppose we want a constraint "at most one of this subset of variables is true". It could naively be translated into n2 2-SAT clauses. However, we can introduce auxiliary variables to get by with only O(n) clauses: "bi must be true when at least one of the first i original variables ai is true" with clauses bi->bi+1 and ai->bi. Disallowing two true variables is then done via the clause bi->!ai+1.

What are the other cool 2-SAT reduction tricks?

Thanks for reading, and check back as I continue to remember August and September :)

Sunday, July 30, 2017

A week with two trees

This week's contests were clustered on Sunday. In the morning, the first day of IOI 2017 took place in Tehran (problems, results, top 5 on the left, video commentary). The "classical format" problems wiring and train were pretty tough, so most contestants invested their time into the marathon-style problem nowruz. Three contestants have managed to gain a decent score there together with 200 on the other problems, so it's quite likely that they will battle for the crown between themselves on day 2. Congratulations Attila, Yuta and Mingkuan!

A few hours later, Codeforces Round 426 gave a chance to people older than 20 :) (problems, results, top 5 on the left). The round contained a bunch of quite tedious implementation problems, and Radewoosh and LHiC have demonstrated that they are not afraid of tedious implementation by solving 4 out of 5. Well done!

In my previous summary, I have raised a sportsmanship question, asking whether it's OK to bail out of an AtCoder contest without submitting anything. The poll results came back as 42% OK, 28% not OK, and 30% saying that the contest format should be improved. As another outcome of that post, tourist and moejy0viiiiiv have shared their late submission strategies that do not consider bailing out as a benefit, so I was kind of asking the wrong question :) I encourage you to read the linked posts to see how the AtCoder format actually makes the competition more enjoyable.

I have also mentioned a bunch of problems in that post, so let me come back to one of them: you are given two rooted trees on the same set of 105 vertices. You need to assign integer weights to all vertices in such a way that for each subtree of each of the trees the sum of weights is either 1 or -1, or report that it's impossible.

First of all, we can notice that the value assigned to each vertex can be computed as (the sum of its subtree) - (the sum of subtrees of all its children). Since all said sums are either -1 or 1, the parity of this value depends solely on the parity of the number of children. So we can compare said parities in two trees, and if there's a mismatch, then there's definitely no solution.

Now, after playing a bit with a few examples, we can start to believe that in case all parities match, there is always a solution. During the actual contest I've implemented a brute force to make sure this is true. Moreover, we can make an even stronger hypothesis: there is always a solution where all values are -1,0 or 1. Again, my brute force has confirmed that this is most likely true.

For vertices where the value must be even, we can assume it's 0. For vertices where the value must be odd, we have two choices: -1 or 1. Let's call the latter odd vertices.

Our parity check guarantees that each subtree of each tree will have an odd number of odd vertices, in order to be able to get -1 or 1 in total. So we must either have k -1's and k+1 1's, or k+1 -1's and k 1's. Here comes the main idea: in order to achieve this, it suffices to split all odd vertices into pairs in such a way that in each subtree all odd vertices but one form k whole pairs, and each pair is guaranteed to have one 1 and one -1.

Having said that idea aloud, we can very quickly find a way to form the pairs: we would just go from leaves to the root of the tree, and each subtree will pass at most one unpaired odd vertex to its parent. There we would pair up as many odd vertices as possible, and send at most one further up.

Since we have not one, but two trees, we will get two sets of pairs. Can we still find an assignment of 1's and -1's that satisfies both sets? It turns out we can: the necessary and sufficient condition for such assignment to exist is for the graph formed by the pairs to be bipartite. And sure enough, since the graph has two types of edges, and each vertex has at most one edge of each type, any cycle must necessarily have the edge types alternate, and thus have even length. And when all cycles have even length, the graph is bipartite.

Looking back at the whole solution, we can see that the main challenge is to restrict one's solution in the right way. When we work with arbitrary numbers, there are just too many dimensions in the problem. When we restrict the numbers to just -1, 0, and 1, the problem becomes more tractable, and it becomes easier to see which ideas work and which don't. And when we add the extra restriction in form of achieving the necessary conditions by pairing up the vertices, the solution flows very naturally.

Thanks for reading, and check back next week!

Sunday, July 23, 2017

An unsportsmanlike week

Yandex.Algorithm 2017 Final Round took place both in Moscow and online on Tuesday (problems, results, top 5 on the left). My contest started like a nightmare: I was unable to solve any of the given problems during the first hour, modulo a quick incorrect heuristic submission on problem F. As more and more people submitted problems D (turned out to be a straightforward dynamic programming) and E (turned out to be about finding n-th Catalan number), I grew increasingly frustrated, but still couldn't solve them. After my wits finally came back to me after an hour, all problems suddenly seemed easy, and I did my best to catch up with the leaders, submitting 5 problems. Unfortunately, one of them turned out to have a very small bug, and I was denied that improbable victory :)

Congratulations to tourist, W4yneb0t and rng.58 on winning the prizes!

Here's the Catalan number problem that got me stumped. Two people are dividing a heap of n stones. They take stones in turns until none are left. At their turn, the first person can take any non-zero amount of stones. The second person can then also take any non-zero amount of stones, but with an additional restriction that the total amount of stones he has after this turn must not exceed the total amount of stones the first person has. How many ways are there for this process to go to completion, if we also want the total amount of stones of each person to be equal in the end?

TopCoder Open 2017 Round 2C was the first event of the weekend (problems, results, top 5 on the left, parallel round resultsmy screencast). The final 40 contestants for the Round 3 were chosen, and kraskevich advanced with the most confidence of all, as he was the only contestant to solve all problems. Congratulations!

This round contained a very cute easy problem. Two teams with n players each are playing a game. Each player has an integer strength. The game lasts n rounds. In the first round, the first player of the first team plays the first player of the second team, and the stronger player wins. In the second round, the first two players of the first team play the first two players of the second team, and the pair with the higher total strength wins. In the i-th round the strengths of the first i players in each team are added up to determine the winner. You know the strengths of each player, and the ordering of the players in the second team. You need to pick the ordering for the players in the first team in such a way that the first team wins exactly k rounds. Do you see the idea?

And finally, AtCoder Grand Contest 018 took place on Sunday (problems, results, top 5 on the left, analysis, my screencast with commentary). tourist has adapted his strategy to the occasion, submitting the first four problems before starting work on the last two, but in the end that wasn't even necessary as he also solved the hardest problem and won with a 15 minute margin. Well done!

As you can see in the screencast, I was also trying out this late submission strategy this time, and when I solved the first four and saw that Gennady has already submitted them, I was quite surprised. There was more than an hour left, so surely I'd be able to solve one more problem? And off I went, trying to crack the hardest problem because it gave more points and seemed much more interesting to solve than the previous one.

I have made quite a bit of progress, correctly simplifying the problem to the moment where the main idea mentioned in the analysis can be applied, but unfortunately could not come up with that idea. That left me increasingly anxious as the time ran out: should I still submit the four problems I have (which turned out to be all correct in upsolving), earning something like 40th place instead of 10th or so that I would've got had I submitted them right after solving? Or should I avoid submitting anything, thus not appearing in the scoreboard at all and not losing rating, but showing some possibly unsportsmanlike behavior?

I have to tell you, this is not a good choice to have. Now I admire people who can pull this strategy off without using the escape hatch even more :) To remind, the benefits of this strategy, as I see them (from comments in a previous post), are:
1) Not giving information to other competitors on the difficulty of the problems.
2) Not allowing other competitors to make easy risk/reward tradeoffs, as if they know the true scoreboard, they might submit their solutions with less testing when appropriate.

I ended up using the escape hatch, which left me feeling a bit guilty, but probably more uncertain than guilty. Do you think this is against the spirit of competition, as PavelKunyavskiy suggests? Please share your opinion in comments, and also let's have a vote:

Is it OK to bail out of a contest after a poor performance with late submit strategy at AtCoder?

Here's the hardest problem that led to all this confusion: You are given two rooted trees on the same set of 105 vertices. You need to assign integer weights to all vertices in such a way that for each subtree of each of the trees the sum of weights is either 1 or -1, or report that it's impossible. Can you do that?

In my previous summary, I have mentioned a hard VK Cup problem. You are given an undirected graph with 105 vertices and edges. You need to assign non-negative integer numbers not exceeding 106 to the vertices of the graph. The weight of each edge is then defined as the product of the numbers at its ends, while the weight of each vertex is equal to the square of its number. You need to find an assignment such that the total weight of all edges is greater than or equal to the total weight of all vertices, and at least one number is nonzero.

The first idea is: as soon as the graph has any cycle, we can assign number 1 to all vertices in the cycle, and 0 to all other vertices, and we'll get what we want. So now we can assume the graph is a forest, and even a tree.

Now, consider a leaf in that forest, and assume we know the number x assigned to the only vertex this leaf is connected to. If we now assign number y to this leaf, then the difference between the edge weight and the vertex weight will increase by x*y-y*y. We need to pick y to maximize this difference, which is finding the maximum of a quadratic function, which is achieved when y=x/2, and the difference increases by x2/4.

Now suppose we have a vertex that is connected to several leaves, and to one other non-leaf vertex for which we know the assigned number x. After we determine the number y for this vertex, all adjacent leaves must be assigned y/2, so we can again compute the difference as the function of y and find its maximum. It will be a quadratic function again, and the solution will look like x*const again.

Having rooted the tree in some way, this approach allows us to actually determine the optimal number for all vertices going from leaves to the root as multiples of the root number, and we can check if the overall delta is non-negative or not.

There's an additional complication caused by the fact that the resulting weights must be not very big integers. We can do the above computation in rational numbers to make all numbers integer, but they might become quite big. However, if we look at the weight differences that some small trees get, we can notice that almost all trees can get a non-negative difference, and come up with a case study that finds a subtree in a given tree which still has a non-negative difference but can be solved with small numbers. I will leave this last part as an exercise for the readers.

Wow, it feels good to have caught up with the times :) Thanks for reading, and check back next week!

A red-black week

Last week, Codeforces presented the problems from the VK Cup finals as two regular contests. First off, Codeforces Round 423 took place on Tuesday (problems, results, top 5 on the left, analysis). W4yneb0t has continued his string of excellent Codeforces performances with the best possible result - a victory, which has also catapulted him to the second place in the overall rating list. Well done!

In between the Codeforces rounds, TopCoder held its SRM 718 very early on Thursday (problems, results, top 5 on the left, anlaysis). The round seems to have been developing in a very exciting manner: three people submitted all three problems during the coding phase, then two of them lost points during the challenge phase, and the last remaining person with three problems failed the system test on the easiest problem! When the dust settled, snuke still remained in the first place thanks for his solution to the hard problem holding. Congratulations on the second SRM victory!

Codeforces Round 424 rounded up the week's contests (problems, results, top 5 on the left, analysis). TakanashiRikka was the fastest to solve the first four problems, and (probably not a sheer coincidence :)) the only contestant to have enough time to consider all cases in the hardest problem correctly while solving at least one other problem. Congratulations on the well-deserved first place!

Here's that tricky problem. You are given an undirected graph with 105 vertices and edges. You need to assign non-negative integer numbers not exceeding 106 to the vertices of the graph. The weight of each edge is then defined as the product of the numbers at its ends, while the weight of each vertex is equal to the square of its number. You need to find an assignment such that the total weight of all edges is greater than or equal to the total weight of all vertices, and at least one number is nonzero.

In my previous summary, I have mentioned a difficult IPSC problem: we start with a deck of 26 red and 26 black cards, and a number k (1<=k<=26). The first player takes any k cards from the deck, and arranges them in any order they choose. The second player takes any k cards from the remaining deck, and arranges them in any order they choose, but such that their sequence is different from the sequence of the first player. The remaining 52-2k cards are shuffled and dealt one by one. As soon as the last k dealt cards exactly match one of the player's sequences, that player wins. In case no match happens after the cards run out, we toss a coin, and each player wins with probability 50%. What is the probability of the first player winning, assuming both play optimally?

Solving this problem required almost all important programming contest skills: abstract mathematical reasoning, knowledge of standard algorithms, coming up with new ideas, good intuition about heuristics, and of course the programming skill itself.

We start off by noticing that the second player has a very simple way to achieve 50% winrate: he can just choose a sequence that is a complement of the first player's sequence (replace red cards by black and vice versa), and then everything is completely symmetric.

How can the second player achieve more? He has two resources: first, he can choose a string that is more likely to appear in the sequence of the remaining cards. Second, he can choose a string that, when it appears together with the string of the first player, tends to appear earlier.

The strings that are more likely to appear are those that leave an equal proportion of reds and blacks (after taking out the string of the first player once and the string of the second player twice), and have no borders (prefixes that are equal to suffixes). This is because we can count the number of ways a given string can appear by multiplying the number of positions it can appear in by the number of ways to place the remaining characters after the matching part is fixed. The number of ways to place the remaining characters is maximized then the remaining characters have equal numbers of blacks and reds. This slightly overcounts the number of ways because in some cases the string can appear more than once; the lack of borders minimizes the number of such occurrences.

The strings that tend to appear earlier when both appear are those which have a suffix which matches a prefix of the first player's string. At best, if the first player string is s+c, where s is a string of length k-1 and c is a character, the second player should pick his string from 'r'+s and 'b'+s. In this case as soon as there's a match of the first player's string not in the first position, we can have a >50% chance to have a match of our string one position before.

Now we can already make the first attempt at a solution: let's try likely candidates for the first player's best move - it should likely be among the strings that have the most appearances; the second player should then choose either another string with lots of appearances, or a string that counter-plays the first player's string in the manner described above. However, this is not enough to solve the problem - we will get a wrong answer.

As part of implementing the above solution, we had to also implement the function to count the sought probability for the given pair of strings. It's also not entirely trivial, and can be done by using dynamic programming where the state is the number of remaining red and black cards, and the state in the Aho-Corasick automaton of the two strings.

So, where do we go from there? Since we already have the function that computes the probability, we can now run it on all pairs of strings for small values of k and try to notice a pattern. We get something like this:

1 0.5 r b
2 0.5 rb br
3 0.3444170488792196 rbr rrb
4 0.35992624362382514 rrbb rrrb
5 0.3777939526283981 rrbrb brrbr
6 0.413011479190688 rbrbrr brbrbr
7 0.45319632265323256 rrbrrbb brrbrrb
8 0.4782049196004824 rrbbrrbb brrbbrrb

No obvious pattern seems to appear. However, we can notice that for large values of k, more precisely when 3k>52, the answer will be 0.5 simply because there is not enough remaining cards for either string to appear. So we only need to research the values of k between 9 and 17 now.

And here comes another key idea: we need to believe that by cutting enough branches early, our exhaustive search solution can run in a few minutes for all those values. At first, this seems improbable. For example, for k=16 we have 65536 candidates for each string, and four billion combinations in total, not to mention the Aho-Corasick on the inside. However, from our previous attempts at a solution we have some leads. More precisely, we know which strings of the second player are the most likely good answers for each string of the first player.

This allows us to get a good upper bound on the first player's score for each particular string reasonably quickly, which leads to the following optimization idea: let's run the search for all strings of the first player at the same time, and at each point we will take the "most promising" string - the one with the highest upper bound so far - and make one more step of the search for it, in other words try one more candidate for the second player's string, which may lower its upper bound. We continue this process until we arrive at the state where the most promising candidate does not have anything else to try, because we already ran through all possible second player strings for it - and this candidate then gives us the answer.

This search runs relatively fast because for most first player strings, our heuristics will give us an upper bound that is lower than the ultimate answer very quickly, and we will stop considering those strings further. It is still quite slow for larger values of k, so we need a second optimization on top: we can skip the Aho-Corasick part in the simple case where there's simply not enough cards of some color for the second player's string to appear. With those two optimizations, we can finally get all the answers in a few minutes.

Thanks for reading, and check back soon for this week's summary!

Sunday, July 16, 2017

A postcard week

The July 3 - July 9 week had a pretty busy weekend. On Saturday, IPSC 2017 has gathered a lot of current contestants together with veterans that come out of retirement just for this contest every year (problems, results, top 5 on the left, analysis). The (relatively) current contestants prevailed this time, with team Past Glory winning with a solid 2 point gap. Well done!

They were one of only two teams who has managed to solve the hardest problem L2. It went like this: we start with a deck of 26 red and 26 black cards, and a number k (1<=k<=26). The first player takes any k cards from the deck, and arranges them in any order they choose. The second player takes any k cards from the remaining deck, and arranges them in any order they choose, but such that their sequence is different from the sequence of the first player. The remaining 52-2k cards are shuffled and dealt one by one. As soon as the last k dealt cards exactly match one of the player's sequences, that player wins. In case no match happens after the cards run out, we toss a coin, and each player wins with probability 50%. What is the probability of the first player winning, assuming both play optimally?

Just an hour later, TopCoder Open 2017 Round 2B selected another 40 lucky advancers (problems, results, top 5 on the left, analysis, parallel round results, my screencast). dotory1158 had a solid point margin from his solutions, and did not throw it all away during the challenge phase, although the contest became much closer :) Congratulations on the win!

The final round of VK Cup 2017 took place in St Petersburg on Sunday (problems, results, top 5 on the left). Continuing the Snackdown trend from the last week, two-person teams were competing. The xray team emerged on top thanks to a very fast start - in fact, they already got enough points for the first place at 1 hour and 38 minutes into the contest, out of 3 hours. Very impressive!

And finally, AtCoder hosted its Grand Contest 017 on Sunday as well (problems, results, top 5 on the left, analysis). Once again the delayed submit strategy has worked very well for tourist, but this time the gap was so huge that the strategy choice didn't really matter. Way to go, Gennady!

Problem D in this round was about the well-known game of Hackenbush, more precisely green Hackenbush: you are given a rooted tree. Two players alternate turns, in each turn a player removes an edge together with the subtree hanging on this edge. When a player can not make a move (only the root remains), he loses. Who will win when both players play optimally?

If you haven't seen this game before, then I encourage you to try solving the problem before searching for the optimal strategies in the Internet (which has them). I think the solution is quite beautiful!

Thanks for reading, and check back for this week's summary.

A hammer week

TopCoder has hosted two SRMs during the June 26 - July 2 week. First off, SRM 716 took place on Tuesday (problems, results, top 5 on the left, analysis). ACRush came back to the SRMs after more than a year, presumably to practice for the upcoming TCO rounds. He scored more points than anybody else from problem solving - but it was only good enough for the third place, as dotory1158 and especially K.A.D.R shined in the challenge phase. Well done!

Later on the same day, Codeforces held its Round 421 (problems, results, top 5 on the left, analysis). There was a certain amount of unfortunate controversy with regard to problem A, so the results should be taken with a grain of salt - nevertheless, TakanashiRikka was the best on the remaining four problems and got the first place. Congratulations!

The second TopCoder round of the week, SRM 717, happened on Friday (problems, results, top 5 on the left, analysis, my screencast). I had my solution for the medium problem fail, but even without that setback I would finish behind Deretin - great job on the convincing win!

Here's what that problem was about. You are given two numbers n (up to 109) and m (up to 105). For each i between 1 and m, you need to find the number of permutations of n+i objects such that the first i objects are not left untouched by the permutation. As an example, when n=0 we're counting derangements of each size between 1 and m.

My solution for this problem involved Fast Fourier Transformation because, well, I had a hammer, and the problem was not dissimilar enough to a nail :) And it failed because I've reused FFT code from my library, which I've optimized heavily to squeeze under the time limit in a previous Open Cup round, and to which I've introduced a small bug during that optimization :(

And on the weekend, two-person teams competed for the fame and monetary prizes at the onsite finals of CodeChef Snackdown 2017 (problems, results, top 5 on the left, analysis). The last year's winner "Messages compress" were in the lead for long periods of time, but in the last hour they tried to get three problems accepted but got zero, which gave other teams a chance. Team Dandelion have seized that chance and won solving 9 problems out of 10. Way to go!

Thanks for reading, and check back for the next week's summary.

FBHC2017 Finals

There was one quite important competition that I forgot to mention two summaries back: Facebook Hacker Cup 2017 onsite finals took place on June 14 (problems, results, top 5 on the left, analysis, my screencast). In this round I have managed to make three different bugs in my solution to the second problem and the way those bugs combined led to my program passing the samples almost by accident, but of course it did not pass the final testing. On the bright side, not spending more time on this problem allowed me enough time to solve all other problems, so maybe the three bugs were actually a crucial component of the victory :)

Thanks for reading, and check back for the hopefully more complete next week's summary!