Loading…

Finding Celebrities in Billions of Web Images

In this paper, we present a face annotation system to automatically collect and label celebrity faces from the web. With the proposed system, we have constructed a large-scale dataset called "Celebrities on the Web," which contains 2.45 million distinct images of 421 436 celebrities and is...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2012-08, Vol.14 (4), p.995-1007
Main Authors: Zhang, Xiao, Zhang, Lei, Wang, Xin-Jing, Shum, Heung-Yeung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we present a face annotation system to automatically collect and label celebrity faces from the web. With the proposed system, we have constructed a large-scale dataset called "Celebrities on the Web," which contains 2.45 million distinct images of 421 436 celebrities and is orders of magnitude larger than previous datasets. Collecting and labeling such a large-scale dataset pose great challenges on current multimedia mining methods. In this work, a two-step face annotation approach is proposed to accomplish this task. In the first step, an image annotation system is proposed to label an input image with a list of celebrities. To utilize the noisy textual data, we construct a large-scale celebrity name vocabulary to identify candidate names from the surrounding text. Moreover, we expand the scope of analysis to the surrounding text of webpages hosting near-duplicates of the input image. In the second step, the celebrity names are assigned to the faces by label propagation on a facial similarity graph. To cope with the large variance in the facial appearances, a context likelihood is proposed to constrain the name assignment process. In an evaluation on 21 735 faces, both the image annotation system and name assignment algorithm significantly outperform previous techniques.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2012.2186121