Evaluation Datasets for Content Based Image Retrieval

In this page, you can find a couple of datasets which can be used to evaluate the performance of an image retrieval system. Each dataset is composed of two files, namely features.txt and classes.txt. The first one contains the features extracted from the images. Each row corresponds to an image, and values have been separated by a space character. The second one corresponds to the category the image belongs to. To ease processing, this is expressed in the form of a number. Note that images themselves are not provided to avoid copyright infringement. Images have been manually classified, and labels in the classes.txt file (numbers) have been manually assigned, to best reflect real user judgements.

Feature extraction has been done Juan Domingo, a senior lecturer at the Computing department in the Univerity of Valencia.

First repository

The first repository is relatively small, and was intentionally assembled for testing, using some images obtained from the Web and others taken by the authors. The 1508 pictures it contains have been manually classified as belonging to 29 different themes such as flowers, horses, paintings, skies, textures, ceramic tiles, buildings, clouds, trees, etc. Two set of features are available for this database, In the first set (features1), Descriptors used are a 10 x 3 HS color histogram (30 features) and texture information in the form of two granulometric cumulative distribution functions [1] (10 more features for each distribution). The second set of features includes de same descriptors as the second repository described below.

Features1 Features2

Class labels

Second repository

The second repository is a subset of 5.476 images extracted from a commercial collection called ``Art Explosion'', distributed by the company Nova Development (http://www.novadevelopment.com). Again, these have been carefully classified by experts into 63 categories so that images under the same category represent a similar semantic concept. A color histogram and six texture descriptors have been computed for each picture in this database, namely Gabor Convolution Energies [2] (features 1-12), Gray Level Co-occurrence Matrix [3] (features 13-19), Gaussian Random Markov Fields[4] (features 20-22), the coefficients of fitting the granulometry distribution with a B-spline basis [5] (features 23-26 represent statistical values, 27-37 values of the granulometry size distribution function F for even values of pixels in [0, 20], 38-48 density values in the same points), 10 x 3 HS color histogram (features 55-84 )and two versions of the Spatial Size distribution [6], respectively using horizontal and vertical segments as structuring elements (features 85-94 and 95-104).

Features

Class labels

These two sets of images have been used in the following publications:

A Hybrid Multi-objective Optimization Algorithm for Content-based Image Retrieval

Miguel Arevalillo-Herráez, Francesc Ferri, Salvador Moreno-Picot

Applied Soft Computing Volume 13, No. 11, pp 4358-4369, 2013 doi:10.1016/j.patrec.2008.08.003

download

BibTeX

An improved distance-based relevance feedback strategy for image retrieval

Miguel Arevalillo-Herráez, Francesc J. Ferri

Image and Vision Computing Volume 31, No. 10, pp 704-713, 2013 doi:10.1016/j.patrec.2008.08.003

download

BibTeX

A Naive Relevance Feedback Model for Content-Based Image Retrieval using Multiple Similarity Measures

Miguel Arevalillo-Herráez, Francesc J. Ferri, Juan Domingo

Pattern Recognition Volume 43, No 3, pp 619-629, 2010 doi:10.1016/j.patcog.2009.08.010

download

BibTeX

Combining similarity measures in content-based image retrieval

Miguel Arevalillo-Herráez, Juan Domingo, Francesc Ferri

Pattern Recognition Letters Volume 29, No 16, pp 2174-2181, 2008 doi:10.1016/j.patrec.2008.08.003

download

BibTeX

A relevance feedback CBIR algorithm based on fuzzy sets

Miguel Arevalillo Herráez, Mario Zacarés, Xaro Benavent, Esther de Ves

Signal Processing-Image Communication Volume 4673/2007 pp. 490-504, 2008 doi:10.1016/j.image.2008.04.016

download

BibTeX

Interactive image retrieval using smoothed nearest neighbor estimates

Miguel Arevalillo Herráez, Francesc J. Ferri

13th International Workshop on Structural and Syntactic Pattern Recognition (S+SSPR 2010)

Lecture Notes in Computer Science, 2010, Volume 6218/2010, 708-717 doi:10.1007/978-3-642-14980-1_70

Cesme, Izmir, Turkey, August 2010

Conference

Learning Combined Similarity Measures from User Data for Image Retrieval

Miguel Arevalillo-Herraez, Francesc J. Ferri, Juan Domingo

19th International Conference on Pattern Recognition (ICPR 2008)

Conference Proceedings, Vol. 1-6, pp. 1-4 doi:10.1109/ICPR.2008.4761068

Tampa, Florida, USA, December, 2008

Conference

download

BibTeX

Usage conditions

We do not impose any conditions on the use of the datasets for research purposes. Any other use is strictly not authorized. We just think it is polite to acknowledge the person who extracted the features (Juan Domingo), but this is not imposed as a condition. We neither impose any references to our work. We believe it should be the authors of a paper who decide whether works are related or not, and act accordingly.

References

[1] P. Soille, Morphological Image Analysis: Principles and Applications, Springer-Verlag, Berlin, 2003.

[2] G. Smith, I. Burns, Measuring texture classi cation algorithms, Pattern Recognition Letters 18 (14) (1997) 1495-1501.

[3] R. W. Conners, M. M. Trivedi, C. A. Harlow, Segmentation of a high-resolution urban scene using texture operators, Computer Vision, Graphics, and Image Processing 25 (3) (1984) 273-310.

[4] R. Chellappa, S. Chatterjee, Classi cation of textures using gaussian markov random elds, IEEE Transactions on Acoustics Speech and Signal Processing 33 (1985) 959-963.

[5] Y. Chen, E. Dougherty, Gray-scale morphological granulometric texture classification, Optical Engineering 33 (8) (1994) 2713-2722.

[6] G. Ayala, J. Domingo, Spatial size distributions. Applications to shape and texture analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (12) (2001) 1430-1442.