Loading…

The Download Estimation task on KDD Cup 2003

This paper describes our work on the Download Estimation task for KDD Cup 2003. The task requires us to estimate how many times a paper has been downloaded in the first 60 days after it has been published on arXiv.org , a preprint server for papers on physics and related areas. The training data con...

Full description

Saved in:
Bibliographic Details
Published in:SIGKDD explorations 2003-12, Vol.5 (2), p.160-162
Main Authors: Brank, Janez, Leskovec, Jure
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes our work on the Download Estimation task for KDD Cup 2003. The task requires us to estimate how many times a paper has been downloaded in the first 60 days after it has been published on arXiv.org , a preprint server for papers on physics and related areas. The training data consists of approximately 29000 papers, the citation graph, and information about the downloads of a subset of these papers. Our approach is based on an extension of the bag-of-words model, with linear SVM regression as the learning algorithm. We describe our experiments with various kinds of features. We focus particularly on issues of feature construction and weighting, which turns out to be quite important for this task.
ISSN:1931-0145
1931-0153
DOI:10.1145/980972.980997