Sunday, July 5, 2020

A Bresenham's week

The May 11 - May 17 week started with Codeforces Round 641 (problems, results, top 5 on the left, analysis). It was quite challenging for some contestants, and nobody could solve more than 4 problems out of 7. Um_nik solved his four in just one half of allotted time and won the round, congratulations!

TopCoder SRM 786 followed on Friday (problems, results, top 5 on the left, analysis). bqi343 had a 50-point lead after the coding phase, and increased it to 100 points during the challenge phase. Congratulations on the victory! These 5 qualification points have also made him a huge favorite to qualify to the TCO from the third online stage.

On Saturday Google held Code Jam 2020 Round 2 (problems, results, top 5 on the left). Placing in the top 1000 was the main goal in this round, but still Benq's jump to the first place with just five minutes remaining in the round was quite impressive, well done!

Open Cup 2019-20 Grand Prix of Serbia wrapped up the week on Sunday (results, top 5 on the left). Teams Polish Mafia, USA1 and NNSU Almost Retired solved 12 problems each, and the deciding factor was Polish Mafia's significantly fewer incorrect attempts. Congratulations on the win!

It was not announced before the round, but this Grand Prix was the last one of the Open Cup 2019-20 season (overall standings, top 10 on the left). The standings were very close at the top, but in the end we have a new champion. For only the second time since 2013, the champion team's last name list does not include Korotkevich :) Congratulations to the team USA1, and looking forward to even more competition at the top in the coming season!

In my previous summary, I have mentioned an Open Cup problem: a string of 0s and 1s is called balanced if the number of 1s in any two its circular substrings of the same length differs by at most 1. A circular substring of a string s is any string with the length not exceeding the length of s that can be read after we write s on a circle; in other words, it's either a normal substring of s, or a suffix of s concatenated with a prefix of s with total length not exceeding the length of s. You are given a string consisting of 0s, 1s and ?s. How many ways are there to replace each ? with a 0 or a 1 such that the resulting string is balanced? The length of the given string is at most 1024.

This setting reminded me of Bresenham's line drawing algorithm, so I tried to get balanced strings like this: let's draw a line from point (0,0) to point (x,y), then let's keep walking starting from point (0,0). We walk like this: if we are below the line, then we go up (increase the y-coordinate by 1), and if we are above the line, then we go right (increase the x-coordinate by 1). If we are on the line, then we go up unless the line is horizontal (y=0), in that case we always go right. We will reach the point (x,y) in x+y steps, and let's write down a 0 every time we go right, and a 1 every time we go up.

Intuitively, such strings should be balanced, as we're always walking very close to the line and all substrings of the same size should represent almost the same vector. Circular shifts of a balanced string are also balanced, so we need to also consider the circular shifts of the strings we get that way. It's not at all clear if there are other balanced strings, though. Imagine how happy I was when I found out that considering only such strings gets correct answers on all samples, and then was accepted by the judging system!

The resulting code is very simple, since the strings we get for different pairs (x,y) are all different, so we just need to find the period of each such string to determine the number of its different circular shifts, and do not need to otherwise check the strings we get for repetitions.

This was one of those rare cases where submitting a solution solely based on intuition and without having even a sketch of a proof worked out :)

Thanks for reading, and check back for more!

Sunday, June 14, 2020

A week without paradox

TopCoder SRM 785 took place during the May 4 - May 10 week (problems, results, top 5 on the left, analysis). The hard problem had this feeling of "it's amazing nobody has asked this before", and it turned out that somebody did :) mochalka had seen this problem before and was the only contestant to get it accepted, winning the match — even in such situation it's important to execute well given the chance, so well done! It turned out that the only bug in my solution was that I was iterating up to bit 29 instead of bit 30 in one place, and it passed in the practice room with that one-byte change. It was disappointing to figure out all tricky details correctly to only fail in that manner :)

Open Cup 2019-20 Grand Prix of Bytedance took place on Sunday (results, top 5 on the left). Team japan02 has really aced it this time, finishing 2 problems ahead of the three leaders of the overall standings and winning their second stage this season with a bang. Congratulations!

Our solution to problem B was quite unusual, and was based on intuition a lot more than is typically reasonable. The problem went like this: a string of 0s and 1s is called balanced if the number of 1s in any two its circular substrings of the same length differs by at most 1. A circular substring of a string s is any string with the length not exceeding the length of s that can be read after we write s on a circle; in other words, it's either a normal substring of s, or a suffix of s concatenated with a prefix of s with total length not exceeding the length of s. You are given a string consisting of 0s, 1s and ?s. How many ways are there to replace each ? with a 0 or a 1 such that the resulting string is balanced? The length of the given string is at most 1024.

In my previous summary, I have mentioned another Open Cup problem: we have a string which is initially empty. Then we repeatedly add characters either to the beginning of the string, or to the end of the string, n times (n<=1000000). Then, we need to print n numbers: how many period lengths did the string have after each addition? A string s has a period length k if 1<=k<=length(s) and for each valid i si=si+k (note that under this definition the last repetition of a period may be incomplete). For example, the string ababa has 3 period lengths: 2, 4 and 5.

The key observation here is that once a certain length k stops being a period, it will never become one again after we add more characters! This is especially clear from the formal definition above, as we just get more constraints of form si=si+k as the number of possible values of i increases. Therefore for each length k it will be a period from step k (each string has its own length as a period length) to some step sk

We can find each particular sk using binary search in O(log(n)), using hashes to check if k is a period length in O(1), therefore we can find all sk's in O(n*log(n)), and the numbers we need to print can then be computed from those in O(n).

It's not particularly important for this problem, but note that we're only doing O(n*log(n)) equality comparisons for the hashes, therefore we're not subject to the birthday paradox, and 32-bit hashes are enough.

Thanks for reading, and check back for more!

Sunday, May 3, 2020

A 2021 week

TopCoder Open 2020 Round 1B was the first round of this week (problems, results, top 5 on the left, parallel round results, analysis). This was the last chance to get into Round 2 for those who have not qualified yet, and the problemset was correspondingly even more on the easier side. It turned out that a nonzero score was enough to advance, so contestants were mostly competing against the problems instead of between themselves. In order to get the first place, however, one required a few successful challenges as well. jzd got three and won — congratulations!

Google Code Jam Round 1C followed on Saturday (problems, results, top 5 on the left). Just like in the TCO round, this was the last chance to get into Round 2, but the competition was more intense: one needed to solve at least the first two problems completely to advance, and do it in about one hour. Just half an hour was enough for all three problems for Rafbill and maroon, and Rafbill held to a 4-second edge to claim the win :) Well done!

Finally, Open Cup 2019-20 Grand Prix of Nanjing wrapped up the action (results, top 5 on the left, analysis). Team NNSU Almost Retired claimed their second win of the season and seem to be in great form — congratulations! It looks like they'll need to maintain the form for quite a bit longer, as ICPC 2020 World Finals got rescheduled to 2021. Team USA1 got 11 problems as well, but an enormous amount of incorrect attempts has ruined their penalty time.

Problem B was very easy for some teams, but we got stuck on it for the whole contest, only to come up with a solution right after the end :) It went like this: we have a string which is initially empty. Then we repeatedly add characters either to the beginning of the string, or to the end of the string, n times (n<=1000000). Then, we need to print n numbers: how many period lengths did the string have after each addition? A string s has a period length k if 1<=k<=length(s) and for each valid i si=si+k (note that under this definition the last repetition of a period may be incomplete). For example, the string ababa has 3 period lengths: 2, 4 and 5. Can you see the trick?

Thanks for reading, and check back next week!

A Lucas week

Codeforces Round 637 last week was declared unrated after the reference solution for problem E turned out to be incorrect, and the problem seemed to be unsolvable.

Therefore TopCoder SRM 784 was the first round that counted (problems, results, top 5 on the left, TCO qualification standingsanalysis). The constructive hard problem required one to come up with a pattern, and some people did it much faster than others :) I could not come up with an explicit pattern myself, instead finding a pattern in row and column sums from solutions of small inputs, and finding the actual grid using max flow. This took me about 3 times longer than the fastest solutions to this problem, though, so I was out of contention for the top places. tourist was among the fastest solvers there, and he was also twice faster than me on each of the first two problems. Congratulations on the well-deserved victory!

Open Cup 2019-20 continued its non-stop return with the Grand Prix of Moscow (results, top 5 on the left, analysis). After the same 3 teams split the first 12 stages between them, we started getting new winners, and team japan02 is already the sixth team to win a stage this season. Congratulations on the superb performance!

In my previous summary, I have mentioned an Open Cup problem: you are given k (k<=200) distinct positive integers ai (ai<=105). Consider all sequences of length n (n<=1018) with all elements equal to one of the given integers. Count the number of those sequences with the given sum s (s<=1018). Is that number odd or even?

Since we're asked about the answer modulo 2, a natural idea is to try and discard groups of sequences which have an even count. There are actually multiple ways to do that, but for us the most natural way was to notice that since the order of the numbers matters, we can count the number of different orderings for each multiset of n numbers that add up to s, and discard those multisets for which this number of orderings is even. When our multiset has ci occurrences of the number ai, this number of orderings is given by the multinomial coefficient: n!/(c1!*c2!*...*ck!).

A quick Google query led us to this mathoverflow post, which tells that this coefficient is odd if and only if there is no carry when performing the addition c1+c2+...+ck=n modulo 2. This, in turn, means that for each bit that is set in n, there must be exactly one ci that has this bit set. Or, in other words, if the binary representation of n is 2b1+2b2+...+2bt, then we need to count the parity of the number of ways to take one of the ai's 2b1 times, one of the ai's 2b2 times (possibly the same one), and so on, so that the sum of all that is s.

I find it beautiful that there is a completely orthogonal argument that leads to exactly the same subproblem, obtained by noticing that the number of non-palindromic sequences is always even. You can find more details in this Codeforces comment.

Now, how to solve the subproblem? Given that s is up to 1018, it still seems to be a pretty tough instance of the knapsack problem. A naive approach would be to implement dynamic programming which counts the [parity of] the number of ways to reach every sum u<=s using the first i powers of two 2b1, 2b2, ..., 2bi, but the running time of that would be O(s*k*t), which is enormous.

However, we can notice that many states of that dynamic programming are actually useless! Suppose we're processing the powers of two in increasing order, and we have processed all powers of two up to but not including 2bi. Then everything that we will add later will be divisible by 2bi. Therefore, only the states that are equal to s modulo 2bi can contribute to the final answer. On the other hand, our current sum will not exceed m*(1+2+...+2bi-1)<m*2bi, where m is the maximum of ai. So there are at most m possible sums with the given remainder modulo 2bi, therefore our new solution runs in time O(m*k*t).

Since m=105, k=200 and t=60, this is about one billion operations — somewhat borderline. It was fast enough for us during the contest, but in case it would not be, I was planning to optimize it further taking advantage of the fact that we only care about parity, and therefore can use bitsets to represent our dynamic programming arrays and do operations on them.

Thanks for reading, and check back for this week's summary!

Sunday, April 26, 2020

A caterpillar week

Codeforces Round 635 was the first round of last week (problems, results, top 5 on the left, analysis). Problems A, B, C and E1 were solved by all top competitors, but the next step was less clear. maroonrk went for the hardest problem F, solving it with just two minutes remaining. However, this was not enough to beat boboniu who did not notice the additional difficulty in E2, solving both E1 and E2 together, and then wrapped up their victory with D. Congratulations to both!

TopCoder Open 2020 Round 1A then started a busy weekend (problems, results, top 5 on the left, parallel round results, analysis). Most top competitors got a bye into Round 2, therefore the scoreboard looked more yellow this time. Pasqual45's points just from the coding phase would be enough for the first place, but they made sure of the win with 100 more challenge points. Well done!

Open Cup 2019-20 Grand Prix of Tokyo was next (problemsresults, top 5 on the left, analysis). All teams solving 4+ problems got E, F, H and I, but the remaining problems were significantly more difficult. Even though 9 problems were solved by at least one team, the best teams only got 6. Our team was close to 6 problems as well, as our last submission on K was correct algorithmically but had an off-by-one error. Congratulations to teams ext71, Polish Mafia and USA1 on actually getting to 6!

Solving problem E involved this wonderful feeling where a relatively natural statement hides a solution with multiple interesting layers that "click" together. You are given k (k<=200) distinct positive integers ai (ai<=105). Consider all sequences of length n (n<=1018) with all elements equal to one of the given integers. Count the number of those sequences with the given sum s (s<=1018). Is that number odd or even?

And finally, Google Code Jam 2020 Round 1B wrapped up the last week (problems, results, top 5 on the left). xll114514 was the first to finish all problems, but mnbvmar was the fastest of the European competitors who didn't want to wake up at 3am for Round 1A, and had one less incorrect attempt than xll114514. Congratulations on the win!

In my previous summary, I have mentioned two problems. The first one came from TopCoder: you are given a directed tournament with n<=23 vertices. You are then constructing a "power" of this tournament with nk vertices (k<=1000), with each vertex described with a k-tuple of vertices of the original tournament. The direction of the edge between two k-tuples is given by the direction of the edge between the vertices that appear in the first position the k-tuples differ. For example, the direction of the edge between (a, b, a, c) and (a, b, d, e) is the same as the direction of the edge between a and d in the original tournament. How many subsets of vertices of this big graph induce a strongly connected subgraph, modulo 998244353?

First, let's deal with the k-th power. Consider a set S of vertices of the power graph. Each of those vertices is a k-tuple of vertices of the original graph. Let's consider the first position p in those tuples which has at least two different vertices. For example, if S has vertices (abac), (abad) and (abbc), then we're interested in the third position since both a and b appear there, while in the first position we always have a, and in the second position we always have b.

Now let's consider the set T of all vertices of the original graph that appear in this position. If S is strongly connected, then T is also strongly connected, because any path can be "projected" to T and still be a valid path due to the construction of the power graph.

Now suppose T is strongly connected. It turns out that then S is strongly connected as well, because when there's an edge from a to b in the original graph, there's an edge from any vertex with a in position p to any vertex with b in position p, therefore in order to find a path from any vertex in S to any other vertex in S it suffices to find a path between the corresponding vertices in the p-th position, and then pick any matching vertices for the intermediate stopping points. Note that we rely on the fact that T has at least two vertices here, as we need a cycle in T to build a path between two vertices in S that have the same vertex in the p-th position.

We have found that the strong connectivities (is there such a word?) of S and T are equivalent. Therefore if we can solve our problem for k=1, we can solve it for any k: consider a subset T of size m>=2 of the original graph that is strongly connected. How many sets S correspond to T in the p-th position? For each of the m vertices of T we have nk-p vertices of the power graph with this vertex in the p-th position, and we need to take any non-empty subset of those, so in total we get (2nk-p-1)m subsets.

Here comes the more interesting part of the problem: what to do for k=1? More specifically, how to count the number sets of vertices of each size that induce a strongly connected subgraph in a given graph with n<=23 vertices?

My approach was based on the Floyd-Warshall algorithm. We would like to run it on all 2n subsets, which would take O(2n*n3), which is too slow for n=23. We can replace the inner loop with one bitwise or operation, bringing the running time down to O(2n*n2) — still too slow.

However, we can now merge the iteration over subsets with the Floyd-Warshall algorithm! The outer loop of Floyd-Warshall iterates over intermediate vertices, so if we take a recursive algorithm that iterates over the 2n subsets, we can do one iteration of Floyd-Warshall outer loop as soon as we've decided that a vertex belongs to the subset. Therefore we will need to do at most one iteration of the Floyd-Warshall outer loop at each level of recursion, bringing our running time to O(2n*n+2n-1*n+2n-2*n+...)=O(2n*n)!

Here is the relevant part of my contest submission with some comments:

// Recursive search: we have decided that from the first
// "at" vertices we have taken the subset "mask",
// and the intermediate state of Floyd-Warshall after
// iterating over them is in "preach".
private void rec(int at, int mask, int[] preach) {
    if (at >= n) {
        // Check if the graph is strongly connected.
        // "preach" already has the transitive closure for the
        // chosen subset, so we check if it's complete.
        int count = 0;
        for (int i = 0; i < n; ++i) if ((mask & (1 << i)) != 0) {
            ++count;
            if ((preach[i] & mask) != mask) {
                return;
            }
        }
        if (count >= 2) {
            ++res[count];
        }
        return;
    }
    // Do not take at-th vertex.
    rec(at + 1, mask, preach);
    // Take at-th vertex, meaning that we need to do one more
    // iteration of Floyd-Warshall outer loop.
    // To avoid allocating new arrays in Java too often,
    // I have pre-allocated one array per level of recursion
    // which I overwrite here.
    int[] nreach = reach[at + 1];
    System.arraycopy(preach, 0, nreach, 0, n);
    for (int i = 0; i < n; ++i) if ((nreach[i] & (1 << at)) != 0) {
        nreach[i] |= nreach[at];
    }
    rec(at + 1, mask | (1 << at), nreach);
}

The second problem was from Codeforces: you are given a tree with n<=105 vertices. You want to draw n closed curves on the plane, one per vertex, in such a way that two curves intersect if and only if the corresponding vertices are adjacent in the tree. What is the maximum size of a chain of curves that are nested in one another (but do not intersect) in such drawing?

The space of possibilities for drawing arbitrary curves on the plane is enormous, so it's not clear how to approach this problem. First, we can notice that the curves for adjacent vertices intersect, and therefore can't be nested in one another. Is that the only restriction? In other words, can we choose the curves in such a way that any two curves either intersect (when the corresponding vertices are adjacent) or are nested (if they're not)? At first I believed it was always possible, then finding the size of the longest chain is just finding the size of the largest independent set in a tree, which can be done with a simple greedy from the leaves. I made a quick submission, which turned out to be incorrect :)

Then I started to think when does this approach fail. After some doodling on paper, I have found a relatively small counterexample (on the left). Moreover, I have also come up with a description of cases where it does in fact work: they all seemed to be caterpillar trees! After some more doodling, I was able to build the set of curves that achieves the perfect nesting for any caterpillar tree (on the right, the curves corresponding to the main path of the caterpillar are red, and the dotted blue curves corresponding to the legs of the caterpillar could be multiple at each level).

Finding the size of the largest independent set in all sub-caterpillars of the given tree is slightly more tricky, but still doable with dynamic programming on the tree. However, is it the correct solution? I did not have a proof, but after so much doodling that didn't provide a counterexample it felt right, so I decided to try with another submission, which worked. You can find the proof in the editorial :)

Thanks for reading, and check back for this week's summary!

Monday, April 13, 2020

A Gennady week

Last week saw the first elimination round of Google Code Jam 2020, Round 1A (problems, results, top 5 on the left). Gennady.Korotkevich started his title defense with another commanding performance in the middle of the night, wrapping up 10 minutes before everybody else. Well done!

TopCoder SRM 783 followed later on Saturday (problems, results, top 5 on the left, my screencast, analysis). tourist, bqi343 and scott_wu were much faster than other top competitors on the 1000, but Scott's 250 failed to handle perfect squares correctly, and Ben's single challenge was still not enough to catch Gennady. Congratulations on the second victory of the day!

The hard problem had two almost independent parts: solving the k=1 case, and understanding how to reduce k>1 to k=1. The problem statement was: you are given a directed tournament with n<=23 vertices. You are then constructing a "power" of this tournament with nk vertices (k<=1000), with each vertex described with a k-tuple of vertices of the original tournament. The direction of the edge between two k-tuples is given by the direction of the edge between the vertices that appear in the first position the k-tuples differ. For example, the direction of the edge between (a, b, a, c) and (a, b, d, e) is the same as the direction of the edge between a and d in the original tournament. How many subsets of vertices of this big graph induce a strongly connected subgraph, modulo 998244353?

The k=1 part — where the graph with at most 23 vertices is just given to you directly — allowed quite a lot of interesting and different solutions, so I encourage you to try solving at least that part :) I believe my solution did not in fact need the graph to be a tournament, it could handle any directed graph, so you can try solving either version.

This was the first time I was recording a screencast while using an external 4K monitor, which brought a couple of challenges: first, it seems that encoding a 4K video was using nearly all resources of my machine, even though I was using the same bitrate as I did for the 1080p video. This meant I had to wait for half a minute every time I ran my solutions, somewhat suboptimal :) Second, the fonts in the arena looked really funny, most likely because I had font scaling enabled in Windows to be able to read text in such resolution. Any suggestions on how to resolve those?

Codeforces Round 633 took place on Sunday (problems, results, top 5 on the left, analysis). My approach turned out to be very similar to previous week's Codeforces round, solving four problems at reasonable speed but then giving up too easily on the fifth, instead looking for challenges when there were virtually none available — the only failing solution in my room was hos.lyric's D, and I did not really suspect it to be incorrect. tourist was in a similar spot in the middle of the leading pack after solving four problems, but went on to win the round after solving E. Congratulations on three wins in a row!

Radewoosh jumped into the second place with a correct submission on E with 7 minutes remaining, then almost blew it by actually submitting a version without bitsets, and then some version with bitsets but also with too much debug enabled, and finally submitting the correct bitset solution again with just seconds remaining in the round. What was that? :)

Problem D was quite nice: you are given a tree with n<=105 vertices. You want to draw n closed curves on the plane, one per vertex, in such a way that two curves intersect if and only if the corresponding vertices are adjacent in the tree. What is the maximum size of a chain of curves that are nested in one another (but do not intersect) in such drawing?

Thanks for reading, and check back next week!

A cheese week

Codeforces hosted its Round 631 during the Mar 30 - Apr 5 week (problems, results, top 5 on the left, analysis). I had a relatively good start on the first four problems, but could not make progress in the last one and switched to challenging with about 20 minutes remaining, also without success. Um_nik, on the other hand, had an even better start and did not give up so easily on E, earning the ultimate reward of being the only contestant to solve all problems. Great job!

When solving problem C, I could not come up with a solution analytically for quite some time, so I've used the adage "so many people got it accepted, the solution must be simple!" and started thinking in terms of what a simple solution might look like. I've came up with the [ultimately] correct greedy very quickly, but could not prove that it works. After some more time, I've decided to just implement and submit it, relying on pretests and the adage to help :)

When this approach works, it is satisfying, but I feel that using it might be detrimental in the long run: it greatly reduces the motivation when solving the problems analytically, the opportunity to just start submitting good-looking solutions is too tempting. Or maybe it really is a part of the game?

Thanks for reading, and check back for more!