Tuesday, August 18, 2015

A week with Merlin

Codeforces Round 315 gave this week a great start on Monday (problems, results, top 5 on the left). Congatulations to Nikolay on his second Codeforces victory!

Problem D was a really nice opportunity for everybody to demonstrate their creativity. You were given 100000 lines on a plane, and needed to check if it's possible to pick 5 points such that every given line passes through at least one chosen point.

Even finding all pairwise intersections of those lines would not fit into the time limit, so we have to somehow reduce the search space. There are two main approaches to do it: randomized and deterministic. The main idea of the randomized approach is: if we just take two random lines from our set, how likely is it that they intersect in one of the 5 sought points? The main idea of the deterministic approach is: how many lines do we actually need out of 100000 to find one of the sought cover points or to determine that it's impossible to cover them with 5 points?

By the way (you might already know this, of course) for most programming contest problems you can submit your solutions after the contest ends, just for fun and to check your ideas - and that includes the problems I mention in this post, so feel free to take a stab at them without the contest pressure. Solving the problems after the contest ends is usually called "upsolving" in the Russian community, but the word doesn't feel right to me. How would you call it?

Google Code Jam 2015 Onsite Finals in Seattle was the main event of the week (problems, results, top 5 on the left, live stream recording). Gennady has continued his unbelievable victory streak well into the second year running - amazing talent and consistency! Bruce Merry has solved the same amount of problems including the most difficult one, but two of his solutions turned out to contain small bugs. Both of those problems, somewhat unusually for Code Jam, had large inputs that were substantially different from the small inputs, and thus one could not use the small's instant feedback to verify the solution for the large. That has cost Bruce dearly.

I was the author of problem A "Costly Binary Search", but I think problem E "Merlin QA" was the most beautiful in this round. Without the background story (which was very nice by iself), here's what the task boiled down to. You are given 100 8-dimensional vectors, and you're adding them in one of 100! possible orders, but with a small twist: every time one of the coordinates of the sum is negative, it becomes zero. Here's a 2-dimensional 2-vector example: if you have vectors (5, -2) and (-3, 1), then adding them in the given order yields (0, 0) -> (5, -2) -> (5, 0) -> (2, 1), while adding them in the reverse order yields (0, 0) -> (-3, 1) -> (0, 1) -> (5, -1) -> (5, 0). Your goal is to pick the order of addition that maximizes the sum of coordinates of the resulting vector.

I encourage you to try solving this problem by yourself for now, as the main idea is so beautiful, and I'll come back with the solution next week.

Google Distributed Code Jam 2015 Onsite Finals took place in Seattle one day later (problems, results, top 5 on the left). Once again had Bruce failed two of his large inputs, but this time he was so far ahead of the rest of the field that it hardly mattered. In fact, he could've failed one more large input and still win the contest!

Thanks for reading, and see you next week.

Sunday, August 16, 2015

A week with zero as a divisor

Yandex.Algorithm final round took place online last week (problems in Russian, results, top 5 on the left). Gennady has won the round thanks to a very important quality: he didn't make any bugs in his solutions :) Congratulations!

It was nicely pointed out that I fail to possess this quality for the third year in a row in exactly the same way: by having a bug in problem C.

In 2013, I've submitted C in a hurry without any testing, and it was not a big surprise that it failed. Here's the diff between my submission and the one that passes - as you can see, it's essentially one more corner case not covered by the original submissions:

+    int extraDelta;
+            extraDelta = 0;
+        if (totalP + 2 + 4 * (n - total) == p && oldTotalP - 2 + 4 * (n - total + 1) == p && total >= 2) {
+            have[id] = false;
+            have[99 * MAX + 100] = true;
+            totalP += 2;
+            extra = n - total;
+            extraDelta = 1;
+            printBoundary();
+            return true;
+        }
-            return (r == c && 100-r <= extra);
+            return (r + extraDelta == c && 100-c <= extra);

In 2014, the bug was very stupid, and the best recipe for avoiding its repetition seemed to be simply not making such stupid bugs :) Of course, testing would help tremendously, but the contest format almost rules that out.
-                if (hexamino[r][c] == 1) {
+                if (hexamino[r][c] == sides[1][1][0]) {

This year, the bug is equally stupid, and the fix is equally short:
+            for (int i = 0; i < n; ++i) if (pr[i] == 0) pr[i] = i;

Here's the code snippet where that line makes a difference - its goal is to find any prime divisor for all numbers up to n:
             int n = (int) 1e6;
             int[] pr = new int[n + 1];
             Arrays.fill(pr, 0);
             for (int i = 2; i * i <= n; ++i)
                 if (pr[i] == 0) {
                     pr[i] = i;
                     for (int j = i * i; j <= n; j += i) pr[j] = i;

A very typical question about programming contests goes like this: do the skills you've learned help you in your job as a software engineer? However, my very ugly snippet above begs for the opposite question: do the skills learned on your job as a software engineer help achieve better results in algorithmic programming competitions?

Here are some ways I could've avoided this bug (and thus would've walked away with some more cash), and I think all of them can be learned in a software engineering job:

  • Hardcoding n=1e6 is a bad idea. Had I not done that, I would've noticed the bug on the sample input where n=11 and prime number 11 is actually used in the output. I did have the array up to the value of n from the input at one point, but then changed to the fixed 1e6 boundary to make sure a possible timeout shows immediately on the first testcase.
  • Using two different types of special values (pr[i] == 0 and pr[i] == i) to denote the same thing is a bad idea. I should've just filled with pr[i] == i initially.
  • The previous issue has appeared because I've initially had the code written with boolean array "pr", then converted it to integer array since I needed any divisor in the solution below. Had I thought the solution through before starting coding, this wouldn't have happened.
But why did all those bugs still happen, despite the 8 year software engineering experience? It seems to be a case of over-confidence to me. I was so confident that I could write the solution for this easy problem quickly, that I didn't think it through, and I've made quick patches without thinking about the consequences. Hopefully this is a lesson learned, both for the next year and for the actual software engineering work!

IOI 2015 took place in Almaty one week earlier (problems, results, top 5 on the left). Congratulations to Jeehak Yoon on the full score and the victory, and to Mikhail Ipatov on the close pursuit! This year's problemset consisted of 4 relatively normal format problems, and two somewhat unusual ones.

The first unusual problem, scales, asked to come up with a strategy to sort 6 coins using a very weird comparison that takes 3 or even 4 coins as inputs and has three possible results of each comparison (see the problem statement for more details). Since the number of coins was extremely small - just 6 - solving this problem boiled down to manually experimenting on your own machine with various comparisons and sets of outcomes that appear, as I understand.

The second unusual problem, towns, challenged one with learning some properties about a hidden tree using a small amount of requests of distances between tree's leaves. This problem also requires one to come up with an algorithm, and the unusual side is the fact that you get to pick which part of input you want to see.

I love that the IOI continues to pick such unusual problems, and hope that they'll continue to do so!

Thanks for reading, and check back soon for this week's summary.

Thursday, August 6, 2015

A week that's never sorry

The problems and results of last week's VK Cup 2015 have been published after the online mirror took place on Thursday (problems, results, top 5 on the left, online mirror results with shuffled problems). Congratulations to Team Never Sorry who have squeezed out the first place by being the only team to solve problem C, taking advantage of the dynamic scoring system. The problem in question wasn't extremely difficult - it required one nice idea that gives most of the solution and then a lot of careful logical reasoning to figure out all corner cases: consider a tree, and for each vertex find the set of vertices at distance at most 2 from it. You're given those n sets in random order (so you don't know which vertex each set corresponds to), and you need to reconstruct the orignial tree. The size of the tree is at most 1000.

TopCoder SRM 664 wrapped up this week on Saturday with three mathy problems (problems, results, top 5 on the left, my screencast). The easiest one went like this: we have two numbers, x and y, and repeatedly double the smaller number, subtracting it from the larger number at the same time, so that the sum stays the same. For example, if we start from (1, 6), then we get (2, 5), then (4, 3), then (1, 6) again and so on. Which two number are we going to get after two billion steps? x and y are up to 1 billion.

Many people have noticed that several examples all lead to a periodic sequence quickly, and implemented general period-finding algorithms. However, it turns out that the period can be rather large, and we got an eventful challenge phase.

In fact, it's an interesting (and not very difficult) question by itself: what is the largest period length under the given constraints?

Thanks for reading, and check back next week!

Monday, July 27, 2015

A week before IOI

Codeforces Round 313 has set the tone for an active week after a few very calm ones (problems, results, top 5 on the left). Only TooSimple and qwerty787788 have managed to solve all problems, but the former has picked much better problem solving sequence, and thus won convincingly - great job, and great preparation for the upcoming IOI in Kazakhstan!

TopCoder SRM 663 took place 25 hours later (problems, results, top 5 on the left, my screencast). Subscriber has found two crucial challenges and claimed the victory - great job! Of course, the challenge phase would have much less importance had myself or anybody else managed to solve the hardest problem: we have n binary variables a1,a2,...,an, each initialized randomly, independently and uniformly to 0 or 1. Then, we repeatedly pick two random indices i and j such that i < j, and set overwrite aj with the value of ai. This process will stabilize as soon as all variables have the same value. Before that happens, what's the expected number of times the given sequence of variable values (n numbers, 0 or 1 each, corresponding to the n variables in order) appears?

I've spent an awful lot of time digging in various wrong directions, and with just about 10 minutes to go in the coding phase a good idea came to my mind: simulate the probability of getting each sequence of 0s and 1s for a small value of n after each step. This simulation revealed a very unexpected observation (which can be proved by induction): the probability of seeing any given sequence at any given step depends only on the number of 0s/1s in the sequence, and not on the positions of those 0s and 1s!

Can you figure out the rest? It is relatively easy, but I needed quite some time to finish the implementation after the round - it took me maybe 20 more minutes to get working.

VK Cup 2015 Finals have also happened today, but the results or problems are not available online yet - we can just look at Daria' twitter so far.

Thanks for reading, and check back next week!

Saturday, July 25, 2015

A week when easy is enough

Last week TopCoder Open 2015 Round 2C presented three tricky, if a tiny bit standard, problems (results, top 5 on the left, my screencast, parallel round results). As a result, a fast solution for the easy problem was enough to qualify for the next round, just like in old days :) Of course, a successful challenge together with a not-so-fast solution was also enough. Congratulations to everybody who qualified!

The medium problem has tripped the most competitors, with just 2 correct solutions out of 54 submitted. It went like this: you're placing 8 segments of given lengths into a longer segment of given length one by one in the given order. The smaller segments must lie inside the longer segment, and must not intersect. You can pick any position for a small segment that does not contradict the above rule. When there's no such position, the small segment is not placed into the large segment at all (but if there's at least one possible position, you have to pick one). For each small segment you need to return one of three outcomes:
  • it will definitely be placed into the longer segment irrespective of how previous segments are placed, or
  • it will definitely not be placed into the longer segment, or
  • both situations can occur depending on placement of previous segments.
At first sight, the problem seems straightforward: we probably need to compare some sums of lengths of small segments with the length of the large segment. However, after some more thinking pretty tricky cases come to light: for example, even if some subset of previous segments can fill the large segment completely and thus leave no space for the current segment, it might happen that it's impossible for exactly this subset to appear inside the large segment since we can't skip a segment - we must place it if there's space.

At this point, a solution from the other end of the spectrum comes to mind: what if we try all possibilities of how the small segments are placed, and carefully write down all constraints that appear - i.e., if a segment was not placed, then all gaps currently have strictly smaller length; if a small segment is placed into a gap that is bounded in length, we get two gaps such that their total length is bounded, and so on.

This solution has quite a few different cases to consider, and thus brings a temptation to look for a simpler solution, maybe some invariant that holds or some simple inequalities that we should check.
But it turns out that all (to the best of my knowledge) such simplifications fail, and one simply has to take their time and carefully track the facts about the remaining gaps as we place the segments.

Thanks for reading, and check back tomorrow for this week's summary!

Sunday, July 12, 2015

An associative week

TopCoder SRM 662 took place this week (problems, results, top 5 on the left). With just under 600 competitors, this was the lowest attendance for a TopCoder SRM since SRM 307 which happened more than nine years ago. Of course, the starting time at 3am in Europe and 4am in Russia wasn't exactly helpful.

Nevertheless, the problemsetter cgy4ever and the admins have once again prepared very nice problems for everybody to solve. The hardest problem was not solved by anybody: you want to define an associative (but not necessarily commutative or inversible) binary operation on N elements, and have already decided what should be the results of applying the operation when the first argument is some fixed element X and the second argument is any of the N elements. Is there a way to define the rest of the operation to make it associative?

I haven't solved this problem myself yet, so I will try to at least share my approach. Given a formal set of constraints, we should try to see what can we derive from those constraints. Since we know the result of X*Y for all Y, and know that the operation is associative, we can look at (X*X)*Y=X*(X*Y) and learn the result of (X*X)*Y for all Y. In the same manner, we know the result of the operation if the first argument is X*X*...*X. If the data filled so far already leads to a contradiction, or to a non-associativity, then there's no solution. But can you see what should we do if there's no contradiction?

If I were solving this problem myself during the round, I would probably try to add some heuristic to define the remaining elements of the multiplication matrix, then implement a random testcase generator and see where the solution fails, hoping to obtain some more intuition about the problem.

Thanks for reading, and check back next week!

Just a summer week

There were no big international algorithmic competitions last week (the one that ended on 5th of July), so the time seems right for some boring history digging :)

Between my own archives and the data available on the Internet, my earliest programming contest submission available that scored at least some points seems to be the one to problem 1 of the 1997 Russian Olympiad in Informatics.

The problem was: you're given at most 50 "bad" words of length at most 6, consisting of 3 letters each ("I", "N", and "W" - maybe somebody who was judging that olympiad can shed the light about why those three were chosen?), and each bad word has an associated integer penalty. You need to build a word of the given length (at most 100) consisting of the same three letters, so that the total penalty obtained by summing the penalties for all bad words appearing in it as substrings, is minimized. If a bad word appears multiple times, the penalty is counted multiple times, too.

These days, the problem doesn't seem that difficult, although at that time just 4 contestants have managed to solve it in full, the most famous of which is probably Nikolay Durov. His code uses the dynamic programming approach that seems quite obvious now, although for some reason he uses base-4 representation for the last 5 letters instead of base-3 - maybe to avoid dealing with the first few letters separately.

My own solution did an exhaustive search over all possible output strings, and thus scored just 4 points out of 40. Here it is (note the John Dethridge-style indentation :)

wrd=array[1..MaxM]of strin;
pla=array[1..MaxM]of byte;
Next:array[char]of char;
procedure load;
var f:text;i:byte;s,w:string;cnt:integer;
for i:=1 to M do begin
z[i]:=copy(s,1,pos(' ',s)-1);
delete(s,1,pos(' ',s));
function getpl(gs:string):word;
var i,j,n:byte;pl:word;
for i:=1 to M do begin
  for j:=1 to M-length(z[i])+1 do
    if copy(gs,j,length(z[i]))=z[i] then inc(n);
procedure work;
var mm:word;ps:byte;
if mm<min then begin min:=mm;r:=t; end;
if ps=0 then begin t:='';break; end;
until t[ps]<>'I';
if t='' then break;
until false;
procedure save;
var f:text;

Thanks for reading, and check back later today for this week's summary!