Finding Advertising Keywords on Web Pages

Published in WWW-2006, 2006

Download paper here

A large and growing number of web pages display contex- tual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the impor- tance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each potential keyword, inverse document frequency, presence in meta-data, and how often the term occurs in search query logs. The system is trained with a set of example pages that have been hand-labeled with “relevant” keywords. Based on this training, it can then extract new keywords from previ- ously unseen pages. Accuracy is substantially better than several baseline systems.