Sunday, January 13, 2019

A Dilworth week

This week was light on contests, so let me talk about the Codeforces problems I've mentioned in last week's summary. The first one was: you are given a permutation of size n <= 100000, and you need to split it into subsequences such that each subsequence is either monotonically decreasing or monotonically increasing. The number of subsequences needs to be small, but does not need to be minimal. More precisely, your solution should have the optimal worst case performance for any given n: the number of subsequences should not exceed the maximum number of subsequences necessary for any permutation of size n.

One fact that definitely looks helpful is the Dilworth's theorem: it tells us that we either have a long increasing subsequence, or can split our sequence into few decreasing subsequences. More formally, let's define f(n) to be the maximum number of subsequences our solution produces for any permutation of size n. Applying the Dilworth's theorem allows to build a solution with the following recurrence on f():

f(n)=maxk(min(k,1+f(n-k))

where the inner minimum corresponds to the fact that we can either split everything into k decreasing subsequences, or remove an increasing subsequence of length k and apply our solution recursively.

Printing a few first values of f(n) computed the recurrence above, we get:

f(1) = 1
f(2) = 1
f(3) = 2
f(4) = 2
f(5) = 2
f(6) = 3
f(7) = 3
f(8) = 3
f(9) = 3
f(10) = 4
f(11) = 4
f(12) = 4
f(13) = 4
f(14) = 4
f(15) = 5
f(16) = 5
...

Here we can notice that the smallest value of n such that f(n)=k is the k-th triangular number: 1+2+...+k=k*(k+1)/2. Having noticed it, it's relatively easy to prove it by induction.

So we have a solution using Dilworth's theorem, but is it good enough for the problem? Does it have the same worst case performance as the best possible solution for any given n?

The answer is yes, and the triangular numbers together with the recurrence above give us a hint how to see it. We need to come up with some permutation of size 1+2+...+k for which we can prove that it can't be split into less than k increasing or decreasing subsequences. 

Here is one such sequence for k=5: 1, 3, 2, 6, 5, 4, 10, 9, 8, 7, 15, 14, 13, 12, 11.  In other words, we take a decreasing segment of length 1, then append a decreasing segment of length 2 with all numbers bigger than the previous segment, then append a decreasing segment of length 3 with all numbers bigger than the previous segment, and so on until a segment of length k. Consider any split of this sequence into increasing and decreasing subsequences. Suppose it has x increasing subsequences. Each increasing subsequence covers at most one element of each decreasing segment, so for k-x segments with more than x elements at least one number will be left uncovered. But a decreasing subsequence can't have numbers from two different segments, so we will need at least k-x decreasing subsequences to cover the rest, and the total will be at least k.

This proves that our solution does in fact have the optimal worst case performance for any given n. One small thing that the Wikipedia article doesn't give is a constructive way to apply the theorem: finding the longest increasing subsequence is a well-known algorithm, but how do we actually split the sequence into as many decreasing subsequences quickly? Some quick googling helped me find the answer during the contest: in the longest increasing subsequence algorithm, let's call the length of the longest increasing subsequence ending in each element the level of this element. It turns out that the elements of the same level always form a non-increasing (and since we have a permutation, decreasing) subsequence, as otherwise the element to the right would have a higher level, so we just split by level.

The second problem I mentioned turned out to be a repeat from 9 years ago: you are given a nxm matrix such that n*m<=300000, with one character from the set A, C, G or T in each cell. You need to change as few characters as possible (to other characters from the set) in such a way that each 2x2 submatrix has all four different characters.

Given that there are consequently two editorials for it, I don't think I can add much :)

Thanks for reading, and check back next week!

1 comment:

  1. It's a bit surprising that 9 years have passed since I set problems :)

    ReplyDelete