What is the dataset?
This dataset contains 61 categories of food items. These 61 categories are a subset of the 101 categories. There are 3 instances of each food item, each taken on different days. We are careful in selecting the 61 categories to ensure that it is not possible to recognize the food item from the background/lighting alone.
Baselines approaches
We evaluate the accuracy of standard computer vision recognition algorithms on this dataset. Specifically, we examine the accuracy with which two popular representations, color histograms and SIFT, are able to
capture the image content in our fast food images. The goal is to provide standard baselines for image processing and computer vision researchers who are working in this area rather than to propose such methods as the state of the art in automated fast food recognition.
We employ the following consistent methodology in both of the experiments. Twelve images (from different views of two instances) of each of the 61 food types are utilized as the training set, while the six images (from the third instance) are held out for testing. Each instance is held out in turn and results are averaged over this three-fold cross validation. In particular, we ensure that no instance of a food item ever appears in both the training and test sets. We train a multi-class SVM classifier using the former data using the popular libsvm package, with standard parameters.
Results of the baselines
Classification accuracy on the 61 categories:
Color histogram- 11.3%,
Bag of SIFT features- 9.2%
Baseline Image Data: Caution! Large zip file!
Lab Still Shots [9MB]
Mei Chen
Rahul Sukthankar
Dean Pomerleau
Casey Helfrich
Intel Labs Pittsburgh
Jie Yang
Wen Wu
Lei Yang
Franziska Kraus
Anlu Wang
Carnegie Mellon University
Kapil Dev Dhingra
Columbia University