Loading…

PEWeb: Product Extraction from the Web Based on Entropy Estimation

Mining product descriptions (PDs) from e-commercial web sites is an important task in information extraction from the Web. In this paper, we propose an efficient technique for this task. The technique first discovers the set of PDs based on the measure of entropy at each internal node in the HTML ta...

Full description

Saved in:
Bibliographic Details
Main Authors: Phan, Xuan Hieu, Horiguchi, Susumu, Ho, Tu Bao
Format: Conference Proceeding
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Mining product descriptions (PDs) from e-commercial web sites is an important task in information extraction from the Web. In this paper, we propose an efficient technique for this task. The technique first discovers the set of PDs based on the measure of entropy at each internal node in the HTML tag tree. Afterwards, a set of association rules based on heuristic features is employed to filter the output and therefore enhance the precision. The experimental results of PEWeb system show that the proposed method outperforms existing automatic techniques remarkably.
DOI:10.5555/1025132.1026390