INRIA Person Dataset


This dataset was collected as part of research work on detection of upright people in images and video. The research is described in detail in CVPR 2005 paper Histograms of Oriented Gradients for Human Detection and my PhD thesis. The dataset is divided in two formats: (a) original images with corresponding annotation files, and (b) positive images in normalized 64x128 pixel format (as used in the CVPR paper) with original negative images.

Contributions

The data set contains images from several different sources:

Note

Original Images

Folders 'Train' and 'Test' correspond, respectively, to original training and test images. Both folders have three sub folders: (a) 'pos' (positive training or test images), (b) 'neg' (negative training or test images), and (c) 'annotations' (annotation files for positive images in Pascal Challenge format).

Normalized Images

Folders 'train_64x128_H96' and 'test_64x128_H96' correspond to normalized dataset as used in above referenced paper. Both folders have two sub folders: (a) 'pos' (normalized positive training or test images centered on the person with their left-right reflections), (b) 'neg' (containing original negative training or test images). Note images in folder 'train/pos' are of 96x160 pixels (a margin of 16 pixels around each side), and images in folder 'test/pos' are of 70x134 pixels (a margin of 3 pixels around each side). This has been done to avoid boundary conditions (thus to avoid any particular bias in the classifier). In both folders, use the centered 64x128 pixels window for original detection task.

Negative windows

To generate negative training windows from normalized images, a fixed set of 12180 windows (10 windows per negative image) are sampled randomly from 1218 negative training photos providing the initial negative training set. For each detector and parameter combination, a preliminary detector is trained and all negative training images are searched exhaustively (over a scale-space pyramid) for false positives (`hard examples'). All examples with score greater than zero are considered hard examples. The method is then re-trained using this augmented set (initial 12180 + hard examples) to produce the final detector. The set of hard examples is subsampled if necessary, so that the descriptors of the final training set fit into 1.7 GB of RAM for SVM training.

Starting scale in scale-space pyramid above is one and we keep adding one more level in the pyramid till floor(ImageWidth/Scale)>64 and floor(ImageHeight/Scale)>128. Scale ratio between two consecutive levels in the pyramid is 1.2. Window stride (sampling distance between two consecutive windows) at any scale is 8 pixels. If after fitting all windows at a scale level some margin remains at borders, we divide the margin by 2, take its floor and shift the whole window grid. For example, if image size at current level is (75,130), margin (with stride of 8 and window size of 64,128) left is (3,2). We shift all windows by (floor(MarginX/2), floor(MarginY/2)). New image width and height are calculated using the formulas: NewWidth=floor(OrigWidth/Scale) and NewHeight=floor(OrigHeight/Scale). Here scale=1 implies the original image size.

Also while testing negative images, to create negative windows, we use the same sampling structure.
You may download the whole data set from here (970MB). To avoid duplicating images, 'neg' image folder in 'train_64x128_H96' and 'test_64x128_H96' are referenced using symbolic links.

Disclaimer

THIS DATA SET IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The images provided above may have certain copyright issues. We take no guarantees or responsibilities, whatsoever, arising out of any copyright issue. Use at your own risk.

Support of European Union 6th framework project aceMedia is greatly acknowledged. For any questions, comments or other issues please contact Navneet Dalal.