Near duplicates image retrieval

The solution is based on the visual words concept. Visual word is defined as descriptor of some salient point on image. We have developed original image descriptor with very compact representation by single 32-bit integer (compare it to the popular SURF descriptor with 64-vector of 32-bit float numbers!). So an image can be described by 1KB-bag of visual words (about 250 words per image). Dictionary of visual words is learned from image collection and consists of several millions of visual words. For fast retrieval we use inverse index of images by visual words and classic tf-idf scoring procedure. On the final stage we perform geometric verification of the matched words.

Some numbers

  • Image collection size: 100,000,000 images
  • Test hardware: single server with 64GB RAM and 2 Xeon CPU with 12 cores each
  • Retrieval performance: latency 300 ms, throughput 100 queries/sec
  • Indexing performance: latency 100 ms, throughput 150 queries/sec
  • Quality evaluation results: recall 0.949, precision 0.989
  • Data storage type: NoSQL database