Loading…
Text Categorization by a Machine-Learning-Based Term Selection
Term selection is one of the main tasks in Information Retrieval and Text Categorization. It has been traditionally carried out by statistical methods based on the frequency of appearance of the words in the documents. In this paper it is presented a method for extracting relevant words of a documen...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Term selection is one of the main tasks in Information Retrieval and Text Categorization. It has been traditionally carried out by statistical methods based on the frequency of appearance of the words in the documents. In this paper it is presented a method for extracting relevant words of a document by taking into account their linguistic information. These relevant words are obtained by a Machine Learning algorithm which takes manually selected words as training set. With the lexica obtained by this technique Text Categorization is performed by using Support Vector Machines. The results are compared with one of the most used method for term selection (based just on statistical information) and it is found the new method performs better and has the additional advantage of automatically selecting the filtering level. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-540-30075-5_25 |