The Semantic Gap

The semantic gap is the massive gap between the semantic label we assign to a picture, and the numeric pixel representation the computer actually sees.

This is a really hard problem because you can change the picture in very small, subtle ways that will cause this pixel grid to change entirely. For example, if this same picture of a cat was to be taken from a different angle, then every single grid, every single pixel in this giant grid of numbers would be completely different - yet somehow it's still representing the same cat. And our algorithms must be robust to this.

Challenges

Our algorithms must be robust to variations such as:

  • Lighting:
  • Viewpoint Variation:
  • Deformation:
  • Occlusion:
  • Intraclass Variation:
  • Background Clutter:

Our algorithm needs to work and handle all these different variations, while simultaneously retaining sensitivity to the inter-class variations. If we want our computer programs to deal with all of these problems, all simultaneously, and not just for cats but for just about any object category you can imagine, this is a fantastically challenging problem. But actually, not only does it work, but these things work very close to human accuracy in some limited situations - and take only hundreds of milliseconds to do so.

But how would you write an API for an image classifier? You might sit down and try to write a method in Python, which would take n an image, do some crazy magic, and then this class label to say cat or dog or whatnot. But it seems impossible to enumerate clearly, explicitly, intuitively, how you might go about writing an algorithm to recognize these objects. This leads us to the idea of the data-driven approach.

results matching ""

    No results matching ""