Thursday, June 2, 2011

Binary search and eps in comparisons

The usual wisdom about floating-point comparisons inside binary search is to make comparisons exact, to avoid losing precision (as opposed to comparison using some error tolerance denoted as eps). For example this:

for (int step = 0; step < 100; ++step) {
  double middle = (left + right) / 2;
  if (f(middle) > 0) right = middle; else left = middle;
}

is preferable to

for (int step = 0; step < 100; ++step) {
  double middle = (left + right) / 2;
  if (f(middle) > eps) right = middle; else left = middle;
}

when f is a monotonically increasing function, because even with small eps, there's a danger that the corresponding error in the binary search parameter will be much bigger. On the other hand, even if our comparison is incorrect for equal values due to rounding errors, the binary search will still converge correctly since equal values may only appear in one point and everything will be correct in points very close to it.

But all above is true only if f is monotonically increasing! If it is just non-decreasing, and has a plateau, going without eps gets you hugely incorrect results. This bit me today in the hard problem of SRM 508: my failed solution that passes after adding error tolerance to the last comparison in enough() (TopCoder login required).

I guess the lesson is that when trying to be too clever (instead if adding eps to all comparisons), one needs to be extra careful :-)

8 comments:

  1. Well, technically you're right, but practically what happens is that the first snippet solves f(x) = 0 by finding _any_ such x, while in reality we usually need _largest_ such x (in other words, the boundary between f(x) <= 0 and f(x) > 0). If f(x) = 0 between -infinity and some point x0, and f(x) > 0 when x > x0 (this is what happened in the SRM), then this method may return any value between -infinity and x0 instead of x0 (in the SRM it was -infinity :)). Solving f(x) = eps provides a good approximation to largest x such that f(x) = 0.

    ReplyDelete
  2. Vlad Shcherbina10 May, 2013 11:37

    How precise comparison is worse than approximate one in a presence of plateau?

    Isn't second snipped just solving equation f(x) = eps instead of f(x) = 0?

    ReplyDelete
  3. 1) It is not precise comparison vs. approximate comparison; it is fixed number of comparisons or comparison until the error is is small enough.

    2) It is f(x) <= eps. You cannot  generally solve f(x) = C  exactly with a computer.

    ReplyDelete
  4. Pedro Osório10 May, 2013 11:37

    Thanks for (yet another) nice lesson. I guess when you are pressured to perform at a very high speed (in order to grab the 1st place), you are bound to make one or two mistakes occasionally. I'm still expecting the day you will beat that "magic" 4000 mark =)

    ReplyDelete
  5. Well, blindly adding epsilons will also fail in other situations. The thing that is actually important is keeping a clear invariant: always state a condition such that everything to the left of the place you seek is "good" and everything to the right is "bad", and then maintain the invariant "left is good, right is bad". Then, what you are testing inside the bsearch is the question "is middle good or bad"? And you need to make this test as precise as you can.

    ReplyDelete
  6. I completely agree that the invariant approach is the only way to approach binary search logically. However, I believe I did use exactly this approach - and still failed. In the problem in question, my invariant was "left is bad, right is good". When checking whether middle is good, I've tried to be precise: I've checked that every subset of k cities covers at least k/n fraction of the perimeter, without any eps - exactly so that the comparison is as precise as possible. I've thought that "if for some value of middle the total length is exactly k/n, then for slightly larger value it will be more than k/n, and thus the comparison without eps will produce exact results". And this last statement contained a tiny mistake - it doesn't work for k=n (and only for that case).

    If you could explain how you expected me to apply the approach to this problem more cleanly, that would be great!

    ReplyDelete
  7. Bandishankar10 May, 2013 11:37

    one more concern is about error that might happen while executing "double middle = (left + right) / 2;" . plz refer this
    http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

    ReplyDelete
  8. Because of the way floating point works there will be no information loss. That page is referring to integer overflow, which works differently to exponential overflow.

    ReplyDelete