Tuesday, December 11, 2018

when it should have returned 3.0

Let me put the weekly summaries aside for a moment and talk about the TCO 2018 finals experience, while it's still relatively fresh. As the TCO finals is one of the most important contests of the year, naturally its problems often receive deeper analysis than usual. In this post, I'd like to dive into the easy problem. The correct solution is described in the editorial, so I will focus on the incorrect ones: namely, all solutions submitted during the contest :)

I've started to have my suspicions during the contest itself, as I was implementing a rather complicated floating-point solution which had quite a bit of subtractions of possibly close numbers. However, at that point I concluded that given that it's the first problem, most likely the floating-point part checks out.

Then I saw 3 out of 6 solutions fail the system test, which was completely unexpected. Then I watched the recording of Scott's and Lewin's broadcast (which seems to be gone now :( Anyone has a link?), where Michal and another Michal mentioned the top-down solution, and told that the bottom-up solution that operates in formal piecewise-linear functions is very tricky to get right — but it was exactly the solution that all contestants seemed to implement.

The final realization came when I was looking at Um_nik's code for the problem, to understand why it could fail the system tests. It all seemed to check out, but had one big difference with, say, my own code: it used 64-bit integers to represent the x-coordinates of the junction points of the piecewise-linear function instead of doubles. It is true that all junctions have integer x-coordinates, so this strongly pointed to the fact that 64-bit integers simply overflow, which was indeed the issue.

But then it occurred to me: if 64-bit integers overflow, and the answer can be very small, how come the double-based solutions have enough precision? Armed with the knowledge that super large x-coordinates are possible, I've taken another look at my solution, and the potentially problematic line immediately stood out:

return value[pos] + slope[pos] * (at - control[pos]);

Here I'm trying to evaluate a linear function when x=at. There are actually two subtractions in this line: one denoted with a minus, and one with a plus :)

The at - control[pos] one is actually not an issue by itself, as both at and control[pos] are integers, and the best answer is always found for relatively small values of at, so when control[pos] is also small this subtraction is exact, and when it's big we're not subtracting two close numbers anymore.

However, the plus is actually an issue. Suppose at is small, the value of the linear function there is also small, but both control[pos] and value[pos] are big, then we're subtracting one huge number (-slope[pos] * (at - control[pos])) from another (value[pos]) to get a tiny result, and this is where the small relative error turns into a huge absolute error.

In order to construct a concrete testcase, we need to understand how to get a segment in our piecewise-linear function such that one end has both coordinates small, and the other end has both coordinates big. We can notice that if we take any rooted tree and its corresponding piecewise-linear function, and add a new root which has two children, the old tree and a new leaf with value 0, then what happens is that the x-coordinates of all junction points get multiplied by 2, and the y-coordinates of the junction points get the absolute value of corresponding x-coordinate added to them.

Let's start with a tree with a root and two leaves, with values -1 and 1. Its piecewise-linear function has junction points (-2,2) and (2,2). Now we grow it using the above trick n times, obtaining a caterpillar (see the picture on the right), with the piecewise-linear function defined by the junction points (-2n+1,2n+1), (0,2) and (2n+1,2n+1). Everything so far is computed exactly right as powers of two are representable in double.

Now we do one last caterpillar expansion step, but the new leaf has value of either 1 or -1, depending on which exact bug we're trying to exploit. This causes the solution to evaluate the above linear function in point -1 or 1. If it uses the right boundary of the segment to do the interpolation, then having value 1 in the new leaf will cause it to subtract roughly speaking 2n+1-3 from 2n+1, which is bound to return 0 instead of 3 when n is big enough. Similarly, if a solution interpolates using the left boundary, then -1 will do the trick.

And sure enough:
Petr: The method returned 0.0 when it should have returned 3.0
tourist: The method returned 1.57412216096E21 when it should have returned 3.0
ACRush: The method returned 0.0 when it should have returned 3.0

In other words, all submitted solutions were incorrect, and the judges were completely right in saying that the piecewise-linear approach is super tricky to get right. Luckily, even if the system tests caught this, it would have no bearing on the top 4 in the standings.

Having identified the problematic line, and having understood the mechanics of the huge testcases, it's now relatively easy to come up with a fix: we just need to make sure that the plus in that line is always an addition, and never a subtraction. This is achieved by using the smaller of the two y-coordinates of endpoints of the segment for interpolation, or practically by changing the if condition from

if (pos < control.length) {
    return value[pos] + slope[pos] * (at - control[pos]);
} else {
    return value[pos - 1] + slope[pos] * (at - control[pos - 1]);
}

into

if (slope[pos] < 0) {
    return value[pos] + slope[pos] * (at - control[pos]);
} else {
    return value[pos - 1] + slope[pos] * (at - control[pos - 1]);
}

in my solution. I'm reasonably certain the solution is correct after this fix, but feel free to prove me wrong and come up with another challenge case :)

Note that there's one more small thing my solution does right from precision point of view: I'm keeping some redundant information in my representation of a piecewise-linear function, storing ~3n values for a function with ~2n degrees of freedom. This allows me to evaluate each segment independently, and thus not care about precision errors on huge values which have no bearing on the final answer which is always achieved in a relatively small point. Had I for example kept the entire control, slope, but only the left-most element of value, and used running sums to get the rest, I would get precision issues even with the above fix as the low y-coordinates would already be computed incorrectly. It is possible to have only ~2n values without precision issues (i.e. keep only value and control, and the slope only before first and after last control point), but choosing such representation correctly requires some understanding of the precision issues, while having some redundancy allows to avoid more potentially problematic computations.

Thanks for reading, and check back for more weekly summaries as I still hope to catch up with 2018 in 2018!

Sunday, November 18, 2018

A fjzzq2002 week

TopCoder SRM 737 started the competitive Sep 17 - Sep 23 week (problems, results, top 5 on the left, analysis). There was quite a bit of challenge phase action at the top, but the final results mostly reflect the coding phase order. Congratulations to fjzzq2002 on the win!

Codeforces Round 511 followed (problems, results, top 5 on the left, analysis). eds467 was quite a bit slower than the rest of the top scorers on the first three problems, but managed to squeeze in problem E in 982 ms out of 1 second time limit and won. Well done!

Open Cup 2018-19 Grand Prix of SPb opened a busy Sunday (results, top 5 on the left). Team Past Glory has managed to solve Artur Riazanov's trademark "formal logic" problem B with a few minutes to go, cementing the first place they earned by solving other problems fast. Well done!

Finally, Codeforces Round 512 started even before the Open Cup ended (problems, results, top 5 on the left, analysis). This time fateice was actually able to solve the hardest problem E, but the five incorrect attempts they needed to pass pretests meant that they ended up below the competitors who solved problem D instead. Nevertheless, congratulations to fateice on solving E and to fjzzq2002 on the second victory of the week!

In my previous summary, I have mentioned an AtCoder problem: construct any 500 by 500 matrix of distinct positive integers up to 1015, such that if we take any two vertically or horizontally adjacent numbers a, b in the matrix and compute max(a,b) mod min(a,b) we always get the same non-zero result.

After playing with the problem a bit, one can notice that it's helpful to separate the numbers into big and small numbers (in any given pair of adjacent numbers, we call the dividend max(a,b) the big number, and the divisor min(a,b) the small number). If we put the big numbers into the black cells of a chessboard pattern, and the small numbers into the white cells of the same pattern, then in any pair of adjacent numbers one will be big and one will be small, just as we need.

Suppose we have chosen the small numbers. Then we can pick each big number as least common multiple of the adjacent small numbers plus one, and max(a,b) mod min(a,b) will always be equal to one. However, that's not yet a solution, as we need to balance two conflicting needs:
  • The small numbers need to be big enough to be distinct and for the resulting least common multiples to also be distinct.
  • The least common multiples, however, need to be small enough to fit under 1015
Naively picking the small numbers randomly, as they need to be distinct, would result in numbers on the order of 105, so the products of four numbers would end up being on the order of 1020, and the least common multiple of random numbers is not too unlikely to be equal to their product.

That means we need to find a way for the least common multiple of four numbers to be significantly smaller than their product, which means that they must have large common divisors. One way to achieve that is to pick a number ai for each row, a number bj for each column, and put ai*bj as the small numbers. Then the four small numbers surrounding a big number at position (i,j) are ai-1*bjai+1*bjai*bj-1, and ai*bj+1. They have quite a few common divisors, and their least common multiple is at most ai-1*ai*ai+1*bj-1*bj*bj+1. If each ai and bj are on the order of 103 (we have 500 rows + 500 columns), the product of six such numbers is on the order of 1018, two orders of magnitude better than the naive approach but still not good enough.

We can notice that among the above four numbers only ai and bj were repeated, thus contributing to the reduction of the least common multiple, while ai-1ai+1bj-1, and bj+1 only appeared once; this suggests a way to improve the solution: instead of assigning numbers to rows and columns, we need to assign them to diagonals containing small numbers. Each small number is at intersection of two diagonals, and we still have just 1000 diagonals in total. Since each big number is surrounded by only four diagonals, its least common multiple will be equal to a product of only four diagonal numbers, and thus be only on the order of 1012.

Since we need all products and least common multiples to be distinct, we can't actually assign numbers from 1 to 1000 to the diagonals, but we can take the first 1000 prime numbers, first 500 to diagonals of one kind, and next 500 for the other kind. The 1000-th prime number is 7919, the 500-th prime number is 3571, so our least common multiples will not exceed 7919*7919*3571*3571 which is less than 1015, and the diagonal numbers being distinct primes guarantees that all numbers in the matrix will be distinct, so we're done!

I have also mentioned an Open Cup problem in the previous summary: the judging program has a hidden string of 1000 digits, each either 0 or 1. In one query, you ask about a segment [l,r], and the judging program returns one of the two things, each with probability 50%:
  • the number u of 1s in the given segment
  • a uniformly chosen random integer between 0 and r-l+1 that is not equal to u.
In other words, with probability 50% the judging program gives an incorrect answer to your query. Your goal is to reconstruct the hidden string using at most 18000 queries, with one more important restriction: you are also not allowed to ask the same query twice.

This problem has been discussed quite extensively in the post-match discussion (this is problem E), including my own solution sketch, so I will refer interested readers there :)

Thanks for reading, and check back for more!

Thursday, November 15, 2018

An interactive week

AtCoder has returned with its Grand Contest 027 during the Sep 10 - Sep 16 week (problems, results, top 5 on the left, my screencast, analysis). There was a pretty tense fight for the second place with cospleermusora getting problem B accepted with less than a minute remaining; but tourist's victory was not really in doubt as he finished all problems with 25 minutes to spare. Congratulations to both!

I've really enjoyed solving problem D (the choice of constructive problems for this blog is becoming quite a pattern, isn't it?): you need to construct any 500 by 500 matrix of distinct positive integers up to 1015, such that if we take any two vertically or horizontally adjacent numbers a, b in the matrix and compute max(a,b) mod min(a,b) we always get the same non-zero result.

The second Open Cup stage, the Grand Prix of Udmurtia, followed on Sunday (results, top 5 on the left). Team Havka-papstvo had dug themselves into a hole thanks to having a lot of incorrect attempts, then marvelously escaped with just 8 minutes remaining by solving the most difficult problem. Congratulations on the victory!

The Grand Prix of Udmurtia was a pioneer of interactive problems in the past, and this incarnation had four of those, too. Problem E went like this: the judging program has a hidden string of 1000 digits, each either 0 or 1. In one query, you ask about a segment [l,r], and the judging program returns one of the two things, each with probability 50%:
  • the number u of 1s in the given segment
  • a uniformly chosen random integer between 0 and r-l+1 that is not equal to u.
In other words, with probability 50% the judging program gives an incorrect answer to your query. Your goal is to reconstruct the hidden string using at most 18000 queries, with one more important restriction: you are also not allowed to ask the same query twice.

In my previous summary, I have mentioned another problem with segment queries: there's a hidden integer between 1 and 1018. You can make queries, and in one query you give a segment [l,r] and the judging program tells you whether the hidden number lies in that segment. In case it does, and your segment has length 1 (l=r), you win. After each query, the hidden number can change a bit — more precisely, by at most k=10. These changes do not depend on your queries — in each testcase the sequence of changes is always the same. You have at most 4500 queries to win.

If the hidden number did not change, we would do a binary search: by querying the segment [1,m] we can compare the hidden number with m, so if we knew that our number was within some segment [a,b] before our query, we would narrow it down to either [a,m] or [m+1,b] after this query, and determine our number after at most ceil(log(1018))=60 queries.

When the hidden number changes under the hood, we need to adapt this approach: now we go from [a,b] to either [a-k,m+k] or [m+1-k,b+k]. When the segment [a,b] is big, this still divides it roughly in half, so we can proceed as before. However, when it becomes small, it will actually stop decreasing, and will never converge to a segment of length 1.

So we will do the following: when the length b-a+1 of the current segment is bigger than some boundary b, we will divide it in two using the above approach. And when it's b or less, we will just pick a random number c within the segment, and send [c,c] query. With probability of at least 1/b, we will win in this case. In case we don't, our candidate segment grows from [a,b] to [a-k,b+k], and we continue as before.

It's important to pick the right value of b: if it's too big, the probability of winning in each attempt of the second kind would be too low, and we won't always finish under 4500 queries. And if it's too small, it will take too many queries of the first kind between two queries of the second kind to reduce the segment size, and we would have too few queries of the second kind and also won't finish under 4500 queries. It's probably possible to find mathematically optimal value of b, or we can take a guess (I've used b=99 during the contest) and verify that it leads to good enough probability to finish under 4500 queries.

Thanks for reading, and check back for more!

Wednesday, November 14, 2018

A too difficult week

The Sep 3 - Sep 9 week started with Codeforces Round 507 on Wednesday (problems, results, top 5 on the left, my screencast, analysis). Once again the hardest problem was not solved during the round, and thus it all came down to the speed on the first four problems. Um_nik was considerably faster than the competition, finishing the four problems under an hour, and thus got a well-deserved first place. Congratulations!

Problem B in this round added a nice twist to a standard setting. This is an interactive problem in which you need to find a hidden integer between 1 and 1018. You can make queries, and in one query you give a segment [l, r] and the judging program tells you whether the hidden number lies in that segment. In case it does, and your segment has length 1 (l=r), you win. So far this is a classical binary search problem.

Here comes the twist: after each query, the hidden number can change a bit — more precisely, by at most 10. These changes do not depend on your queries — in each testcase the sequence of changes is always the same.

Can you see how to adapt the binary search for this setup? You have at most 4500 queries to win.

On Sunday, the new season of the Open Cup kicked off with the Grand Prix of Zhejiang (results, top 5 on the left). This will most likely be the most brutal contest of the year :) As I understand, this was the first "big" contest set by Yuhao Du, and the scoreboard reminds me of my own first Petrozavodsk contest, or my second TopCoder SRM. Team japan02 chose the solvable problems well and earned the victory with quite some margin. Well done!

Thanks for reading, and check back for more!

A plus four week

Codeforces hosted two rounds during the Aug 27 - Sep 2 week. AIM Tech Round 5 took place on Monday (problems, results, top 5 on the left, analysis). All problems were solvable this time for LHiC and OO0OOO00O0OOO0O00OOO0OO, but Mikhail was considerably faster of the two. Congratulations on the win!

Manthan, Codefest 18 was the second Codeforces round of the week (problems, results, top 5 on the left, analysis). The contest really came down to the wire, with the top three participants all completing the problem set with just a few minutes to go. Tourist was just a tiny bit faster with the easier problems, and thus earned the victory. Well done!

This week also marked the end of the summer Petrozavodsk training camp (results, top 5 on the left), the 9-contest event for top Russian and some other ICPC teams to practice before the new season. Given that the second-placed team in those standings is not actually going to participate in official ICPC contests this year, the gap between the first team (current ICPC World Champions) and the rest of the field is daunting, even though it's only August yet :)

In my previous summary, I have mentioned a TopCoder problem: you are given a 10x10 grid and can place at most 150 unit cubes onto it. A cube can be placed onto a cell of the grid, or on top of another cube. Given a number s between 1 and 500, you need to place the cubes in such a way that the surface area of the resulting figure (the total number of sides of the cubes that do not touch the grid or other cubes) is equal to s, or report that it's impossible.

The approach I will describe is a very typical one for such "constructive" problems. Suppose we have found the solution for some value of s. Let's take one extra cube and put it on top of an existing one. This will increase the surface by four (five new sides appear, and one old one is covered), so we'd get a solution for s+4. We can now apply the same trick to get a solution for s+8, s+12 and so on.

Since we spend one cube to increase the surface by 4, and 150*4 is significantly bigger than 500, we won't run out of cubes.

Now the only task that remains is to find out the smallest possible figure for each remainder of s modulo 4. This can be done by analyzing a few cases by hand: it turns out the minimum solvable s for each remainder are 8, 5, 10 and 11.

During the round I've initially discovered another induction idea: just placing a new cube on the grid in such a way that it's disconnected from the rest adds 5 to the surface, so I've tried to build a similar solution modulo 5. However, in that case we do run out of space as we can have at most 50 independent cubes on the 10x10 grid, and 50*5 is less than 500.

Thanks for reading, and check back for more!

Tuesday, November 13, 2018

A 250+ week

The Aug 20 - Aug 26 week was very calm compared to the previous ones, with just the TopCoder Open 2018 Online Wildcard Round on Saturday (problems, results, top 5 on the left, parallel round results, analysis). ACRush, Egor and Stonefeang were the only participants to solve all three problems, but ACRush and Egor were almost twice as fast in solving the hardest one, thus qualifying to the TCO onsite with a 250+ point margin. Well done!

The easy problem in this round was cute. You are given a 10x10 grid and can place at most 150 unit cubes onto it. A cube can be placed onto a cell of the grid, or on top of another cube. Given a number s between 1 and 500, you need to place the cubes in such a way that the surface area of the resulting figure (the total number of sides of the cubes that do not touch the grid or other cubes) is equal to s, or report that it's impossible.

Thanks for reading, and check back for more!

A birdie week

TopCoder SRM 736 started the Aug 13 - Aug 19 week (problems, results, top 5 on the left, my screencast, analysis). This was the first round under the new system in which one can qualify for the last online round and even to the onsite round of TopCoder Open 2019 by placing well in all SRMs in a quarter. rng_58 has claimed the early lead in that leaderboard by winning the SRM by less than one full point!

Codeforces then held two rounds based on VK Cup Finals problems, starting with Round 504 on Friday (problems, results, top 5 on the left, analysis). Just as in the official round, the hardest problem remained unsolved, but this time the winner was determined on challenges. ko_osaga found 9 opportunities and got a clear first place as the result. Well done!

Facebook Hacker Cup Round 3 on Saturday selected the 25 lucky finalists going to Menlo Park (problems, results, top 5 on the left, analysis). Xiao was quite a bit faster than the rest of the pack, qualifying to the onsite finals in style. Congratulations to Xiao and to all finalists!

Finally, another VK Cup-based Codeforces Round 505 wrapped up the week (problems, results, top 5 on the left, analysis). Once again the hardest problem remained unsolved, and once again it took solving all remaining problems plus finding a few challenges to earn a victory — and Swistakk did just that.

Thanks for reading, and check back for more!