Tuesday, January 28, 2020

Best problems of 2019

Just like last couple of years (2017, 2018), I've went through the problems I mentioned in 2019 to find the ones I liked the most. I have also looked at some of the problems recommended in this post, in this post, and in various private messages. Of course, this is still an entirely subjective exercise, and it is certainly easier for me to like a problem that I have solved or tried to solve than one that I did not. Here is the shortlist (for those interested, here is a slightly bigger one), as usual in chronological order:
  • The Open Cup problem "Alien Invasion" about finding an area of a polygon by interactively asking about areas of convex hulls, by ???, with solution in this post.
  • The AtCoder problem "Triangular Lamps Hard" about finding the original state of a cellular automaton on a triangular grid, by rng_58 and yosupo, with solution in this post.
  • The Open Cup problem "Equanimous" about inserting + and - between digits of a number to get the smallest positive value and grouping all numbers in a large segment by that value, by tangjz, with solution in this post.
  • The TopCoder problem "SpanningSubgraphs" about counting spanning subgraphs with each number of edges, by lewin, with solution in this post.
  • The IOI problem "Split the Attractions" about splitting a graph into three parts of given size such that at least two are connected, by LGM and Saeed_Reza, with solutions discussed in this Codeforces post.
Which one do you think is the very best? Also, please help me fill the unknown problem authors in comments!

Monday, January 27, 2020

TCO20 stage 2 leaderboard

Since the official leaderboard for TCO20 stage 2 is not yet ready, I've put together a small script to compute it. Here's the current top 30:

RankHandleScorePoints
1Petr143206.22
2tourist133309.85
3lyrically122646.81
4bqi343122301.09
5Um_nik102383.93
6hitonanode101588.43
7yosupo91537.25
8_aid91506.97
9natsugiri91485.10
10kmjp91464.26
11maroon_kuri91232.54
12neal_wu91152.15
13IH1998041291134.90
14ShadoWsaZ81328.25
15KevinWan71516.60
16ksun4871349.06
17Egor71149.52
18redocpod71140.82
19Vasyl[alphacom]71120.55
20cerberus977781.12
21socketnaut7552.92
22Kalam13261623.00
23KKT8961396.76
24ecnerwal61245.65
25darnley6846.29
26kuniavski6821.25
27square10016788.87
28keymoon6777.42
29nwin6763.49
30Jatana6640.78

Enjoy! In the future I will most likely just rerun the notebook instead of making new posts, so the updated standings will appear there.

Sunday, January 26, 2020

A cyclotomic week

TopCoder SRM 776 was the main event of this week (problems, results, top 5 on the left, analysis). After the coding phase it seemed as if bqi343 would catch up with me in the TCO20 race, but I was quite lucky twice: first, since my incorrect solution for the 1000 passed the system tests; second, since bqi343's 250 has failed. As a bonus, now I have learned about cyclotomic polynomials (I guess it's more like re-learned — surely my mathematician degree should have got me covered here).

The medium problem was very nice as well. There are n=2*a+b pieces of string, out of which a have both ends red, a have both ends green, and b have one red and one green end, so we have n red ends and n green ends in total. We will randomly pair the red and green ends in one of n! possible ways, and tie the corresponding ends together. What is the expected number of cycles we will get? a and b are up to a million.

In my previous summary, I have mentioned a sub-problem of another TopCoder problem: for which pairs of positive integers a <= b can we split all integers from the set {a, a+1, a+2, ..., b-1, b} into two parts with equal sum?

First of all, the sum of all numbers (a+b)*(b-a+1)/2 must be even. Since the two parts in the product (a+b)*(b-a+1) have different parity, one of the parts must be divisible by 4 for the sum to be even. In case the size of the set (b-a+1) is divisible by 4, we can always make such a split: for each four consecutive numbers, we can split them independently as x+(x+3)=(x+1)+(x+2).

Now, what happens when (a+b) is divisible by 4? The size of the set is odd in this case, so we must split into two unequal parts, the smaller part will have at most (b-a)/2 elements, and the bigger part at least (b-a)/2+1 elements. The sum of (b-a)/2 biggest elements in the set is equal to (b-a)/2*(b+b-(b-a)/2+1)/2=(b-a)*(3b+a+2)/8. The sum of (b-a)/2+1 smallest elements in the set is equal to ((b-a)/2+1)*(a+a+(b-a)/2)/2=(b-a+2)*(3a+b)/8. If the former is smaller than the latter, clearly there's no good split as the smaller part will always have the smaller sum.

It turns out that this condition is not just necessary but also sufficient: if we can somehow get the smaller part to have bigger or equal sum, we can make it have equal sum because we can always repeatedly decrease the sum by 1: find two numbers x and x+1 such that x is in the bigger part and x+1 is in the smaller part, and swap them. This argument is the most beautiful part of the solution in my opinion.

The condition (b-a)*(3b+a+2)/8>=(b-a+2)*(3a+b)/8 can be simplified as b>=a+2*sqrt(a), thus our final answer looks like:
  • either b-a+1 is divisible by 4, or
  • a+b is divisible by 4 and b>=a+2*sqrt(a).
Thanks for reading, and check back next week (hopefully for the best problem of 2019 vote as well)!

Tuesday, January 21, 2020

A mathematics week

There were two rounds last week. TopCoder SRM 775 took place on Thursday (problems, results, top 5 on the left, analysis). Tourist has earned a commanding victory while having the fastest time on all three problems, which also meant that nobody could get the 5 points towards the TCO20 qualification. Well done :)

The main part of the hard problem was a nice puzzle that could well appear in a mathematics olympiad: for which pairs of positive integers a <= b can we split all integers from the set {a, a+1, a+2, ..., b-1, b} into two parts with equal sum?

Codeforces Round 614 followed on Sunday (problems, results, top 5 on the left, analysis). There was just one accepted solution for each of the two hardest problems, coming from Um_nik and tourist who have therefore occupied the first two places with a huge margin. Um_nik's problem was worth more points, and he had therefore won the round. Congratulations!

Thanks for reading, and check back next week!

A matroid week

Two weeks ago, TopCoder SRM 774 has started a new race for a TCO20 spot (problems, results, top 5 on the left). The 1000-pointer was tricky both algorithmically and from the implementation standpoint, causing a few resubmissions and a few failed systests for high-scoring solutions. As a result, the total scores were not that high and the importance of the challenge phase was amplified.

In my previous summary, I have mentioned a Codeforces problem: you are given a 20x20 grid colored as a chessboard, with the top-left corner colored black. Some of the cells are removed from the grid, the remaining cells form a 4-connected piece and include the top-left corner. You need to insert some walls between the remaining cells in such a way that we get a good labyrinth: there must be exactly one way to get from each cell to each other cell. Moreover, we want each black cell (remember the chessboard coloring) except the top-left cell to not be a dead-end in the labyrinth: each such cell must have at least two accessible neighbors.

When solving this problem during the round, I have made the correct first step: we want to find a spanning tree where the degree of each black cell is at least two, which is equivalent to finding a spanning forest where the degree of each black cell is at least two as we can always add more edges to get a tree.

But then I've tried to find some greedy approach that takes two edges for each black vertex without forming cycles, realized that it's not always possible, thought that if we want to add an edge that forms a cycle we need to remove some other edge from this cycle and choose another edge for its black vertex, and then somehow failed to notice that I'm just describing finding an alternating path for a matroid intersection problem :) The matroids in question are the cycle matroid and the matroid where independent sets have degree <=2 for each black vertex, and we need to check if the biggest independent set in their intersection has degree of exactly 2 for each black vertex.

In that summary, I have also described a solution to an AtCoder problem that felt like unexplained magic. Um_nik has brought a simpler quadratic solution to my attention: instead of starting from n scores of 1 and adding 1 to a suffix n-1 times, we can start from n scores of n and subtract 1 from a prefix any number of times! The reason we don't need to limit the number of operations to n-1 in this approach is that if the first problem has zero or negative score, then the constraint about the sum of the first k+1 numbers being greater than the sum of the last k numbers would be necessarily violated.

This means we can remove the number of operations dimension from our dynamic programming and it becomes quadratic. This is not directly equivalent to the magical solution, but at least it explains why there's a fertile ground for one.

I have also promised to organize a poll about the best problem of 2019, but for that I need to review all my posts from last year and also the other excellent candidates you shared with me on Codeforces, so this will take some more time. Stay tuned :)

Thanks for reading, and check back for more!

Monday, January 6, 2020

A red maxflow week

Codeforces ran two contests this week. Hello 2020, as the name suggests, was the first round of the year (problems, results, top 5 on the left, my screencast, analysis). Only four contestants could solve the hardest problem G, and only two of them also solved the remaining problems: mnbvmar and TLE. They had roughly the same speed as well, but mnbvmar only had two attempts that failed pretests compared to TLE's ten, and that's what made the difference. Congratulations to both!

I could solve the first five problems reasonably quickly, and I was quite excited about inventing the randomized solution to problem D and quickly recognizing that problem E is more or less equivalent to a very old problem about counting the number of 4-tuples of points that form a convex quadrilateral (I have a feeling that I wrote about it in this blog, but I seem to be unable to find the entry). However, the last two problems proved insurmountable for me, and I spent most of the time trying to get solutions that were clearly not the intended ones to work: max-flow on a graph of size n*log(n) in F (it turns out it was possible to succeed in this way — check out izban's solution as an example), and repeated randomized search in G. I guess the time might have been better spent just thinking on paper, but then the screencast would not be so exciting :)

On a related note, quite a few people have noticed that I've switched to C++ in the recent contests, and asked why. I don't have much to add to this Egor's comment. In the past I have tried switching to C++ a few times and noticed that I keep fighting with it during the contests instead of solving problems, and I do have a similar feeling now as well despite the better tools. However, I will try to keep using C++ for a longer time to see if things improve :)

Here is the hardest problem from this round for you to try as well: you are given a 20x20 grid colored as a chessboard, with the top-left corner colored black. Some of the cells are removed from the grid, the remaining cells form a 4-connected piece and include the top-left corner. You need to insert some walls between the remaining cells in such a way that we get a good labyrinth: there must be exactly one way to get from each cell to each other cell. Moreover, we want each black cell (remember the chessboard coloring) except the top-left cell to not be a dead-end in the labyrinth: each such cell must have at least two accessible neighbors.

Codeforces Round 612 followed a day later (problems, results, top 5 on the left). The sets of problems solved by the top contestants were very diverse (even though not visible in the top 5 screenshot, problem F was also solved by two contestants), but in the end ainta just solved more problems and won. Well done!

In my previous summary, I have mentioned a couple of problems. The first one came from AtCoder: an assignment of integer scores between 1 and n (not necessarily distinct) to n programming contest problems (n<=5000) is called good if for each k the total score of every set of k problems is strictly less than the total score of every set of k+1 problems. How many good assignments of scores exist, modulo the given prime m?

The first step to solve this problem is to notice that we can get rid of "for each k" qualifier, replacing it with just k=(n-1)/2, rounded down. The reason for this is that in case the constraint is violated for smaller k, we can just add the same problems to both sides to reach k=(n-1)/2, and in case it is violated for larger k, the two sets necessarily have an intersection, so we can remove the intersection until we reach k=(n-1)/2 as well. Also, instead of "every set" we can say: the total score of k problems with the largest scores must be less than the total score of k+1 problems with the lowest scores.

To enforce that last constraint, it would be useful if our problem scores were sorted. This can be achieved with the following more or less standard process: we start with all problem scores equal to 1. Now we do the following n-1 times: add 1 to a suffix of problem scores (this suffix could also be empty or all problems).

Now we can keep track how each such operation affects the value we're interested in: the sum of (n-1)/2+1 smallest elements minus the sum of (n-1)/2 largest elements. Going from the empty suffix to the full suffix, they change that value by 0,-1,-2,...,-(n-1)/2,-(n-1)/2 (if n is even),-((n-1)/2-1),...,-2,-1,0,1. Let's denote that multiset of changes as C. In the end we need the value to be positive, and it starts with 1 (when all problem scores are 1).

Our problem can then be restated as follows: consider all ways to choose n-1 values with replacement from the multiset C (the same value can be chosen as many times as we want). How many ways have a non-negative sum?

Since the sum needs to be non-negative and the only possible positive change is 1, this yields a O(n3) dynamic programming solution that can be sped up to O(n2*logn) and get accepted by skipping the states from which we can't reach the final state: let's process the elements of C in decreasing order, and maintain dpi,j,k as the number of ways to choose j values from the first i elements of C such that their sum is k. The answer is the sum of dpn+1,n-1,k over all k>=0.

However, there exists a magical way to speed this up to O(n2). Let's rearrange our dynamic programming slightly: now we will process the elements of C in increasing order, and dpi,j,k will now be the number of ways to choose j values from the first i elements of C such that their sum is -k. Also, we will stop before we process the only positive element 1 (because that would need special handling anyway as the sums stop being only negative), but also (!) before we process one of the two zeros that we have — in other words, we will only consider the first n-1 elements of C in increasing order.

How to compute the answer from the last row of this dynamic programming? Suppose we have computed dpn-1,j,k. Now we need to add some number of 0s and some number of 1s, let's denote those as z and o respectively. We need to have j+z+o=n-1 to get n-1 total changes, and k<=o to get a non-negative sum. The solutions to these two constraints look like: o=k, z=n-1-j-ko=k+1, z=n-1-j-k-1, and so on until z reaches 0. The number of solutions is thus max(0, n-j-k), so our answer is a sum of max(0, n-j-k)*dpn-1,j,k.

Here comes the magic: since max(0, n-j-k) only depends on j+k and not on j and k separately, and since our transitions just add one to j and the current element of C to k, and so the thing being added does not depend on the values of j and k themselves, we can collapse our dynamic programming states to keep track of j+k only, instead of j and k separately! This means we'll have O(n2) states and O(n2) complexity.

What remains a mystery is: how does one come up with this magic? I guess one could just stumble upon it while trying different approaches. Maybe a more principled way is to use the approach from my old post: if we just implement the O(n3) dynamic programming which processes the elements of C in increasing order, and find out the contribution of each state to the final answer, we can notice that the contributions of states with the same value of j+k are the same and collapse them. Is there any other way that makes this observation look less magical?

The second problem I mentioned was from Codeforces: you are given n integers ai such that i-n<=ai<=i-1 (i goes from 1 to n), in other words one integer from [-(n-1),0], one from [-(n-2),1], ..., one from [0,n-1]. You need to find any nonempty subset with a zero sum. You are guaranteed that such subset always exists, which is by itself quite a hint. n is up to a million.

The key idea here is to realize that keeping integers from different segments is a bit clumsy, so let's shift the segments: the first one by n-1, the second one by n-2, and so on. Now all integers are chosen from the segment [0, n-1] which is nice and symmetric, but instead of a zero-sum subset we need to find a subset where the sum of values equals to the sum of shifts.

To restate, we have reduced our problem to the following: you are given n integers a0, a1, ..., an-1, each between 0 and n-1. You need to find a set S of indices such that ΣiSiiSai.

When I obtained this reduced problem during the round, it felt really familiar to me, so I've tried to google the answer without much success. It turns out that it's simply quite easy: build a graph from the arrows i->ai, and just find any cycle in this graph. The indices corresponding to this cycle will satisfy the equality above since the sums will just be the same numbers but in different order.

I will be doing a poll for the best problem of 2019 soon, and I will mostly be picking the candidates from the problems I explained in this blog. However, I realize that there were many great problems in 2019 that I just did not encounter, so if there is a problem you feel should be included in the shortlist that was in a contest that I did not participate in, please mention it in comments! Feel free to also post links to similar discussions, such as this post.

Thanks for reading, and see you next week!

Saturday, January 4, 2020

A mobile-first week

Last week has wrapped up the competitive 2019 with two rounds. AtCoder Grand Contest 041 took place on Saturday (problems, results, top 5 on the left, analysis). mnbvmar, ecnerwal and Um_nik in first three places all have a different set of problems solved, and mnbvmar's set was the best one. He also tried to submit a solution that just tries random decisions until time runs out for E in the last minute, but that did not fly. Nevertheless, congratulations on the win!

I was writing this round on the go, and around the middle of the round my laptop shut down because of low battery, which was very exciting as you might guess :) After I've turned it back on it continued to work for a long time somehow, suggesting that the problem lies with the battery level detection. A few minutes before the end of the round it shut down again, so I've tried to fix my solution to problem A from my phone. Unfortunately I was a few seconds too slow, otherwise it would have been a nice achievement :)

Problem D in this round had an awesome intended solution, even though most contestants managed to squeeze in a more boring one: an assignment of integer scores between 1 and n (not necessarily distinct) to n programming contest problems (n<=5000) is called good if for each k the total score of every set of k problems is strictly less than the total score of every set of k+1 problems. How many good assignments of scores exist, modulo the given prime m?

This round has concluded the selection of 8 AtCoder World Tour finalists that would come to Japan in February for the onsite finals (results, top 8 on the right). With this win, mnbvmar has jumped onto a departing train (there is such idiom in Russian, вскочить на подножку уходящего поезда, but I'm not sure if there is a direct English equivalent or a different idiom with the same meaning — is there?) and overtook eatmore by just 6 points. See you all in Japan!

Codeforces Good Bye 2019 followed on Sunday (problems, results, top 5 on the left, my screencast, analysis). Not without help from a notorious coincidence and a bad day for tourist, Radewoosh has solved everything, won the round, ended the 2019 top rated, and was still a bit salty. Still, congratulations!

Problem G generated some conflicting opinions, but I have enjoyed solving it quite a bit: you are given n integers ai such that i-n<=ai<=i-1 (i goes from 1 to n), in other words one integer from [-(n-1),0], one from [-(n-2),1], ..., one from [0,n-1]. You need to find any nonempty subset with a zero sum. You are guaranteed that such subset always exists, which is by itself quite a hint. n is up to a million.

Thanks for reading, and see you in the present!

Friday, January 3, 2020

An overall week

Codeforces Global Rounds series of 2019 has concluded during the Dec 16 - Dec 22 week with the Round 6 (problems, results, top 5 on the left, analysis). With problem F being tricky to implement but quite standard, problem G requiring to notice a complex pattern after implementing an ineffective solution, and problem H relying on some cool maths coming almost out of nowhere, there was plenty of ways for top contestants to pick their poison. Only 1919810 and sunset could solve two of those three, and 1919810 got all easier problems correct as well thus securing a confident victory. Well done!

The choice of counting the 4 best performances out of 6 for the overall standings (top 5 on the right) seemed to turn out quite nicely, allowing contestants to skip some rounds and/or to enjoy a really bad day sometimes, although maybe Radewoosh would prefer 2 out of 6 :) Thanks to Codeforces and to XTX Markets for organizing the series!

TopCoder Open 2020 points-based qualification stage 1 also concluded that week, with the SRM 773 (problems, results, top 5 on the left, my screencast, TCO20 qualification standings). There was no stopping tourist this time, with yet another dominating coding phase performance and another 175 challenge points he was simply out of reach and thus finally earned the 5 qualification points he needed to qualify for TCO20. I do not have my TopCoder stats scripts working anymore to support this claim with data, but it feels like he has really stepped up the challenge game a lot recently. Huge congratulations!

Open Cup 2019-20 Grand Prix of Xi'An on Sunday sent two problems to the New Year Prime contest (results, top 5 on the left). Having penalty time almost twice as big as that of the second-placed NNSU Almost Retired, team USA1 had no other way to victory except solving more problems, which they delivered by becoming the only team to solve L in the last hour, and then finally getting F from the 24-th attempt. Congratulations!

With this win, team USA1 has almost caught up with team Past Glory in the overall standings after 2019 (top 5 on the right), promising us an exciting second half of the season in 2020!

In my previous summary, I have mentioned a TopCoder problem: you are given 100000 points on the plane and a number k. Given a subset S of the set of given points, and denoting all other given points as its complement T, we define the score of S as the sum of pairwise Manhattan distances between the points in S minus the sum of pairwise Manhattan distances between the points in T. What is the largest (by the number of elements) subset of the set of given points such that its score does not exceed the given number k?

The key insight in this problem is to consider how does the score of a set change when we add one more point to it. The distances from this point to the points already in S are added to the score; the distances from this point to the points not in S are also added to the score as they used to be subtracted from it and now they are not! Therefore adding a point to S simply increases the score of S by the sum of distances from this point to all other points.

Now it is clear that we should just start with empty S and add points in order by this sum of distances until we exceed k.

Thank for reading, and check back for more!

A generating week

TopCoder returned with its SRM 772 during the Dec 9 - Dec 15 week (problems, results, top 5 on the left, my screencast). Once again tourist had to settle for the second place despite great coding and challenge phases because ksun48 could solve the hardest problem. We were discussing during the TopCoder Open onsite that ecnerwal seems to just think in generating functions, and apparently so does ksun48, even though he tried to hide the fact by explaining his combinatorial-ish solution :) Congratulations on the win!

I was comfortably in the top 10 this time, meaning that both me and tourist each got 4 TCO20 qualification points again and I kept my 1-point lead going into the final SRM of the stage.

The medium problem in this round was very nice. You are given 100000 points on the plane and a number k. Given a subset S of the set of given points, and denoting all other given points as its complement T, we define the score of S as the sum of pairwise Manhattan distances between the points in S minus the sum of pairwise Manhattan distances between the points in T. What is the largest (by the number of elements) subset of the set of given points such that its score does not exceed the given number k?

Codeforces then hosted two rounds on the weekend. Round 606 took place on Saturday (problems, results, top 5 on the left, analysis). With a whole 1 accepted solution for both E and F combined, the round was effectively decided on the first four problems. Ainta was the fastest and won the round, well done!

Round 607 followed on Sunday (problems, results, top 5 on the left, analysis). This round got a quite different top 5, with only Radewoosh appearing in both. ksun48 got most of his speed advantage over Um_nik on the easier three problems, and then kept the lead by solving D and E in a comparable amount of time. Congratulations on the win!

Thanks for reading, and check back for more!

An intriguing week

Codeforces Round 604 took place during the Dec 2 - Dec 8 week (problems, results, top 5 on the left, analysis). MiFaFaOvO has won the round, taking some advantage from the fact that problem E has appeared before, but still solving F in a bit more than half an hour while his competitors could not do that in an hour. Well done!

Open Cup 2019-20 Grand Prix of Beijing wrapped up the week on Sunday (results, top 5 on the left). Team Past Glory has continued to break away in the overall standings, winning the round and solving all problems to boot (they would still have won even without solving D at the end of the contest). Congratulations on the win!

In my previous summary, I have mentioned my NEF problem: there are 2n elements (n>=3). We can compare any two elements, and there are no equal ones, so one will compare bigger than the other. Our goal is to make such set of comparisons that uniquely determines the n biggest elements out of 2n, but not the ordering of those n elements. In other words, there must be at least two possible orderings of those n elements that are consistent with the outcomes of comparisons.

For large values of n, since the n biggest elements can be determined in O(n) and sorting them requires O(n*logn), just applying any O(n) algorithm will do as we will not do enough comparisons to determine the order, and some randomized approaches also work. The real problem lies in tackling small values of n, roughly between 3 and 7.

One could imagine that for such small inputs we could do some kind of exhaustive search, but it turns out that already for n=6 the state space is enormous as we have 12! possible inputs. Therefore, we need to come up with an actual algorithm :)

Initially I did implement the exhaustive search to find a solution for n=3, and then came up with a way to obtain a solution for n+1 from a solution for n. However, together with Pavel Kunyavskiy we could come up with a simpler approach.

Let us take arbitrary n+1 elements and split them arbitrarily into two groups of size at least 2, for example of size 2 and n-1. Now let's find the smallest element in each group in any possible way (using only comparisons within the group), and then compare those two elements between themselves. The one which compares smaller is the smallest among the n+1 chosen elements, and therefore is not among the n biggest elements, so we can discard it from consideration.

Now let's add one more element to one of the two groups in such a way that both have size at least 2, and repeat the step above, discarding one more element from consideration. We repeat this until there are no more elements to add (discarding n elements in total).

In the end we're left with the n biggest elements split into two groups, and we have never compared any element from the first group to any element of the second group, therefore there are at least two (more precisely, at least three) possible orderings of those n elements.

Thanks for reading, and check back for more!

A 4-point week

With TCO19 over, the qualification for TCO20 is well underway, and TopCoder SRM 771 was a part of that during the Nov 25 - Dec 1 week (problems, results, top 5 on the left, my screencast, analysis). Tourist did all he can to bounce back from a resubmit on the 1000-pointer, found 175 challenge points but was just 2 points short to overtake majk with his wonderful 800+-point solution :) Congratulations to both!

I was implementing the solution described in this comment, but an internal assertion kept failing on one of the samples. With just seconds left in the round I've removed the assertion and submitted, but of course it still failed the system tests. It turns out the idea was incorrect as well, but the system tests would not have caught that, so one could say I was close :) Luckily for me, many others have failed the system tests and I barely climbed into the top 10, so both myself and tourist gathered 4 points for the TCO20 qualification standings.

On the ICPC side, the Northern Eurasia Finals took place in St. Petersburg, Barnaul, Tbilisi and Almaty on Sunday (problems, results, top 5 on the left, broadcast, online mirror resultsanalysis). The strong favorite team NNSU Almost Retired did well but team SPbSU 25 was a bit faster and won, both teams at 10 problems while others got at most 9. Congratulations to both teams!

In the online mirror, three teams placed above them but as I understand none of those are active/eligible for ICPC. Team Cafe Mountain in the 7th place actually is a current ICPC team competing for Seoul NU (please correct me if I'm wrong!)

I have set the interactive problem I for this round, which went like this: there are 2n elements (n>=3). We can compare any two elements, and there are no equal ones, so one will compare bigger than the other. Our goal is to make such set of comparisons that uniquely determines the n biggest elements out of 2n, but not the ordering of those n elements. In other words, there must be at least two possible orderings of those n elements that are consistent with the outcomes of comparisons. Can you come up with an algorithm with a requirement that it does not do something? :)

Note that this would not be possible for n=2, as it might happen that the first two elements we try to compare are the two biggest elements, and thus we would learn a unique order for them.

Thanks for reading, and check back for more!

Thursday, January 2, 2020

A birthday week

Codeforces hosted two rounds during the Nov 18 - Nov 24 week. Round 601 took place on Tuesday (problems, results, top 5 on the left, analysis). Five contestants could solve all problems, and Radewoosh was quite a bit faster than others on problems C and D which gave him the victory. Well done!

Round 602 followed on Sunday (problems, results, top 5 on the left, analysis). This time even more competitors solved everything, and this time it was sunset who was slightly faster than others (some of whom have made great sacrifices to compete). Congratulations on the win!

In my previous summary, I have mentioned a TopCoder problem: you are given three integers n (1<=n<=1011), d (40<=d<=120) and b (5<=b<=62). Your goal is to return any base-b number that has exactly d digits when written without leading zeros, is divisible by n, and has at most four distinct digits when written in base-b without leading zeros.

I have found the correct idea almost immediately after reading the problem: we need to somehow make use of the birthday paradox, which is closely related to meet-in-the-middle algorithm. Roughly speaking, if we generate random numbers, then after generating O(sqrt(n)) numbers we will have generated two that have the same remainder modulo n.

However, it is important to find the right way to apply it. My approach was to split the number in two parts, keep generating random numbers with the same four distinct digits for the first and second part, and wait until we generate two that add up to 0 modulo n. This is not an exact application of the birthday paradox as we have two different random distributions rather than one, therefore it no longer guarantees that we're going to find such pair after generating just O(sqrt(n)) numbers. However, there is a hand-wavy argument: remainders modulo n of such large numbers are more or less random both for the first and for the second part, therefore the birthday paradox should still apply.

Unfortunately, there are cases where the distribution is not random at all. For example, when b=10 and n=1011, all digits before the 11-th one from the end do not affect the remainder modulo n, and therefore all randomly generated first parts will have remainder 0, therefore we'll need on the order of 1011 randomly generated second parts in order to find a match, instead of sqrt(1011). This case is easy to handle separately of course, as we can just include "all zeros" as one of the candidates for the second part. However, there are more tricky cases where gcd(n,b)>1, or when gcd(n,b-1)>1. I have managed to find a way to make this solution pass the system tests, but it was a very dangerous thing to do and the solution could have easily failed.

The right idea is to find a way to apply the birthday paradox directly. Let's generate random numbers of the full length until we find two that give the same remainder modulo n, this is bound to happen within O(sqrt(1011)) steps. The difference between these two is divisible by n, but there are two issues:
  • the difference might have less than d digits
  • even if each of the numbers has only four distinct digits, the difference might have more
It turns out that both issues can be taken care of if we generate random numbers consisting only of digits 0 and 1 in our process. The difference of any two such numbers will have only digits 0, 1, b-2 and b-1, satisfying the at most four distinct digits constraint. And in case the difference has less than d digits, we can just append enough zeros at the end since zero is one of the allowed digits, and appending zeros keeps the number divisible by n.

This solution provably terminates within O(sqrt(1011)) steps with very high probability, and one can submit it with confidence. Note that the fact that we generate numbers consisting only of digits 0 and 1, and therefore their remainders modulo n might not be uniformly distributed, does not hurt us: if the choices are not uniformly distributed, the birthday paradox collision just becomes more likely.

Thanks for reading, and check back for more!

Wednesday, January 1, 2020

A Houston week

The Nov 11 - Nov 17 week was the week of TopCoder Open finals in Houston.

The first semifinal took place on Thursday (problems, results on the left, broadcast, my Twitter thread, analysis). The easy problem was quite tricky and the sample cases did not really help check correctness, so the first five submissions in this photo were all incorrect. tourist, Egor and scott_wu later resubmitted, and this turned out crucial for Egor who was able to advance on the easy and medium. tourist, lyrically and uwi solved the hard as well, and therefore it did not actually matter if their easy was correct. Congratulations to the four finalists!

The second semifinal followed on Friday (problems, results on the left, broadcastanalysis). Five competitors solved everything during the coding phase, and I was in fourth place but with less than 25 point gap to the fifth. During the intermission, I have correctly concluded that it only makes sense for me to challenge one of the other three contestants, since if any solution of the first five fails, and it's not my solution, then I would advance anyway. However, right after the challenge phase started I've opened Um_nik's 1000, noticed that it does not handle a special case that I had to handle in my solution, and challenged with it. The challenge was unsuccessful, and I dropped to fifth place.

Luckily, it did not end that way and got even curioser after ecnerwal found two unsuccessful challenges of his own, and dropped below me — imagine how surprised and relieved I was at that point :) It turned out that he already knew that his medium was going to fail, so the challenge phase did not matter in the end as everybody who solved three problems advanced to the finals. Congratulations to xudyh, Um_nik and mnbvmar!

Here is the 1000-pointer that got me so excited that I failed to exercise good challenge phase judgement: you are given three integers n (1<=n<=1011), d (40<=d<=120) and b (5<=b<=62). Your goal is to return any base-b number that has exactly d digits when written without leading zeros, is divisible by n, and has at most four distinct digits when written in base-b without leading zeros.

Can you see a way to solve it without having to handle any special cases?

The final round completed the competition on Saturday (problems, results on the left, broadcast, my screencast, analysis). I got stuck in the easy problem, trying to be too clever and to avoid constructing the graph explicitly backfired, and I could not debug my solution during the round. I am notoriously bad at giving up on a problem, and it always felt that I was so close :) I have eventually switched to the hard problem and almost got it right — but only almost. The solution I submitted at the end did not even pass the samples, and it turned out that even though it had 5 different bugs, all were of "j instead of i" kind and probably preventable if I was coding it in less of a rush.

This is exactly what lyrically did, as she started with the hard problem and spent most of the contest solving it, and ended up being the only competitor to do so. In fact, the situation was a bit more complicated as her solution has initially failed my challenge case as well, but it turned out that my challenge case has exposed an issue with the problem itself: it seemed virtually impossible to return an output that would pass the checker because of floating-point precision issues. The admins were put in a difficult spot, and ultimately decided to let me keep the 50 challenge points, but also lowered the required precision and let lyrically's solution pass.

I did not realize myself that the problem was broken, and simply tried to put together a challenge case that could expose potential precision issues in other solutions.

In any case, all this did not affect the first place as tourist was the undisputed champion with good times on easy and medium. Huge congratulations!

Right after the final round, scott_wu and ecnerwal ran a fun double-elimination competition in a new 1:1 format called Lockout (more info, results on the left), trying to make competitive programming more exciting to watch. The competition was indeed very fast-paced and had a lot of plot twists depending on the strategy choices. The format before the last round has placed quite a lot of emphasis on the speed of solving easy problems, though, as all problems cost 1 point. The last round of the competition, the Grand Finals where tourist met Um_nik, took place much later, had different point values and a very nice broadcast — check it out! And of course congratulations to tourist on winning the whole thing :)

In my previous summary, I have mentioned an Open Cup problem: there are n drones (n<=500000), the i-th drone initially in the point (xi,yi,0). Some pairs of drones are connected with cables, those cables are straight lines and they do not intersect except at endpoints. We are then performing k moves (k<=500000), in each move we change the z-coordinate of one of the drones (x- and y-coordinates always remain constant). The cables connected to this drone have to change their length correspondingly, and each cable has a known maximum length — in case it needs to become longer than that, it breaks and stays broken for the remainder of the process. Finally, you are given q (q<=500000) queries, each query being a pair of drones. For each query, you need to find the move which caused the two given drones to become disconnected (using the cables), if any.

If we find out for each cable at which move does it break, the processing of q "when did these two vertices become disconnected" queries offline is a standard problem that is solved by going in reverse order of time and keeping a set of connected components and for each component keeping a list of queries with at least one end in it, and traversing this list for the smaller component when merging two of them.

Finding out the moment each cable breaks is the more interesting part of the problem. A naive approach would traverse all cables connected to the vertex being moved at each step, but that would of course be quadratic in case our graph is a star.

How can we improve the performance of our solution on the star graph? We can notice that we can process the outer vertices of the star quickly as the have only one incident edge, the slowdown only arises when we move the center vertex. In order to process the movements of the center vertex faster, we can notice that for each particular edge, the values of z-coordinate where the corresponding cable does not break form a segment, therefore there are two interesting values — the endpoints of this segment — which should trigger the breakage. We can put all interesting values for the center vertex into a data structure, then whenever we move the center vertex we can just query the data structure to retrieve all interesting values that we have passed when moving from the old value of z-coordinate to the new one, and break the corresponding edges.

Since every edge breaks at most once, this will have the total running time of O(m*logm) where m is the number of edges, and assuming the data structure can retrieve the next edge to break in O(logm). We can use a balanced tree or a pair of priority queues as the data structure.

Note that we should also update this data structure for the center vertex when processing the outer vertices, as the interesting values for the corresponding edge will change, but since each outer vertex has only one incident edge this is also O(logm).

How do we generalize this improvement from the star graph to an arbitrary graph? Let's choose an orientation for each edge. Whenever we move a vertex, we will process its outgoing edges directly. For the incoming edges, we will maintain the data structure with the interesting values of z-coordinate and query it for the new breakages. We also need to update this data structure for the neighboring vertices for each outgoing edge. If the maximum out-degree of a vertex is d, this solution runs in O((k*d+m)*logm).

Now the standard idea would be to orient each edge from the vertex of the smaller degree to the vertex of the larger degree. This would mean that the out-degree of each vertex is at most sqrt(2m), since there can't be more than sqrt(2m) vertices each of degree sqrt(2m) adjacent to a vertex, since the sum of all degrees is 2m. I have managed to squeeze this solution in, but it required some extra optimizations.

However, we can make use of the fact that the cables do not intersect, in other words that our graph is planar. Each planar graph has a vertex of degree at most 5. So we can do the following: repeatedly take the vertex with the smallest number of adjacent non-oriented edges, and orient those edges from this vertex. This will give us an orientation where all out-degrees do not exceed 5, so our complexity becomes just O((k+m)*logm) and no squeezing is required :)

Finally, now is a good time to remind you about the New Year Prime contest, where you can try your hand at solving the problems from 2019 that were not solved by anybody in contest time. Best of luck, and Happy New Year!