Monday, October 1, 2012

TopCoder Open: problem difficulty

Tomorrow is the first day of TopCoder Open 2012, and I'll try to blog on its algorithm track.

Here's a small analysis of TopCoder Open problem difficulty.

Let's consider the time taken to solve the problem as the ultimate measure of problem difficulty. Then, we can estimate the difficulty of, say, a medium problem in a TopCoder Open semifinal by comparing the solving time for this problem for each competitor with the solving time for other medium problems for the same competitor. More precisely, for each competitor in each TopCoder Open onsite round, I've compared the solving time for each problem with the solving time for this competitor in all rounds that took place in the same year, and the percentage of problems that were easier plus half the percentage of problems that were of the same difficulty is the declared measure of difficulty of the onsite problem for this competitor. The median of those numbers over all competitors is the difficulty of the onsite problem.

Here's the result:

TCO '03 Semifinals 1 98% 69% 79%
TCO '03 Semifinals 2 89% 81% 73%
TCO '03 Semifinals 3 92% 87% 68%
TCO '03 Semifinals 4 82% 78% 82%
TCO '03 Finals 91% 90% 81%
TCO04 Semifinal 1 87% 80% 67%
TCO04 Semifinal 2 92% 79% 67%
TCO04 Semifinal 3 84% 51% 64%
TCO04 Wildcard 66% 88% 72%
TCO04 Finals 86% 79% 87%
TCO05 Semi 1 91% 76% 64%
TCO05 Semi 2 90% 71% 64%
TCO05 Semi 3 78% 81% 64%
TCO05 Wildcard 88% 81% 64%
TCO05 Finals 91% 77% 74%
TCO06 Semi 1 93% 77% 69%
TCO06 Semi 2 75% 77% 68%
TCO06 Semi 3 88% 77% 70%
TCO06 Wildcard 66% 84% 71%
TCO06 Finals 66% 85% 77%
TCO07 Semi 1 65% 79% 68%
TCO07 Semi 2 89% 77% 66%
TCO07 Semi 3 86% 83% 65%
TCO08 Semifinal 1 64% 81% 62%
TCO08 Semifinal 2 85% 79% 63%
TCO08 Semifinal 3 76% 80% 63%
TCO08 Wildcard 94% 44% 66%
TCO08 Championship 96% 86% 64%
TCO09 Semifinal 84% 59% 60%
TCO09 Championship 95% 87% 56%
TCO10 Semi 1 89% 84% 61%
TCO10 Semi 2 87% 75% 53%
TCO10 Wildcard 81% 86% 57%
TCO10 Final 90% 54% 61%
TCO11 Semifinal 1 74% 73% 55%
TCO11 Semifinal 2 91% 84% 57%
TCO11 Wildcard Round 87% 90% 56%
TCO11 Championship Round 83% 78% 62%

You can see that the numbers for the hard problem are lower than those for easy and medium; the reason is that I consider "not solved" to be equal to "not solved", and thus when a person solves just 60% of all hards, the perceived difficulty of any problem will not exceed 80%.

With that effect in mind, the above list reveals that 'easy' problems are hovering around 90% mark (more difficult than 90% SRM 'easy' problems), while 'medium' problems are sometimes around 80-85% difficulty but sometimes have huge drops to just average difficulty, around 50%.

Another interesting slice of the same data is the list of competitors ordered by decreasing average difficulty of onsite problems. Here's the list, limited to only those competitors with at least 5 onsite rounds:

PaulJefferys - 6 rounds - 58%
reid - 5 rounds - 59%
tomekkulczynski - 5 rounds - 64%
Eryx - 6 rounds - 65%
gawry - 7 rounds - 66%
Yarin - 5 rounds - 67%
bmerry - 7 rounds - 67%
nicka81 - 5 rounds - 67%
antimatter - 5 rounds - 70%
liympanda - 6 rounds - 70%
ACRush - 8 rounds - 71%
Im2Good - 5 rounds - 71%
Ying - 5 rounds - 71%
ploh - 6 rounds - 71%
grotmol - 6 rounds - 72%
marek.cygan - 10 rounds - 72%
John Dethridge - 9 rounds - 73%
cyfra - 5 rounds - 73%
misof - 5 rounds - 74%
andrewzta - 7 rounds - 75%
tomek - 10 rounds - 75%
SnapDragon - 8 rounds - 77%
Petr - 11 rounds - 81%

So for PaulJefferys and reid, the onsite rounds are actually not much more difficult than normal rounds (I guess partially because they don't do much SRMs, and thus "normal" rounds are the TCO qualification rounds for them), while I'm actually the one who struggles the most at the onsites :)