Loading…

Automatic Keyphrase Extractor from Arabic Documents

The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, inde...

Full description

Saved in:
Bibliographic Details
Published in:International journal of advanced computer science & applications 2016, Vol.7 (2)
Main Authors: M., Hassan, I., Ismail, N., Mohammed, Mahmoud, Maysa
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue 2
container_start_page
container_title International journal of advanced computer science & applications
container_volume 7
creator M., Hassan
I., Ismail
N., Mohammed
Mahmoud, Maysa
description The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.
doi_str_mv 10.14569/IJACSA.2016.070226
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2656495716</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2656495716</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1176-79c9649483f2dbacc20027874f9e26512a48eaea75c2ca612bd505ff09be9c403</originalsourceid><addsrcrecordid>eNotkE1LAzEQhoMoWGp_gZcFz7tOvjfHZa1aLXhQwVvIpgm2uE1NsmD_vWvXuczAvB_wIHSNocKMC3W7emra16YigEUFEggRZ2hGMBcl5xLOT3ddYpAfl2iR0g7GoYqIms4QbYYcepO3tnh2x8NnNMkVy58cjc0hFj6Gvmii6cb_XbBD7_Y5XaELb76SW_zvOXq_X761j-X65WHVNuvSYixFKZVVgilWU082nbGWABBZS-aVI4JjYljtjDOSW2KNwKTbcODeg-qcsgzoHN1MuYcYvgeXst6FIe7HSj36x2gusRhVdFLZGFKKzutD3PYmHjUGfQKkJ0D6D5CeANFfCnBXlA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2656495716</pqid></control><display><type>article</type><title>Automatic Keyphrase Extractor from Arabic Documents</title><source>Publicly Available Content Database</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>M., Hassan ; I., Ismail ; N., Mohammed ; Mahmoud, Maysa</creator><creatorcontrib>M., Hassan ; I., Ismail ; N., Mohammed ; Mahmoud, Maysa</creatorcontrib><description>The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2016.070226</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Algorithms ; Electronic documents ; Information retrieval</subject><ispartof>International journal of advanced computer science &amp; applications, 2016, Vol.7 (2)</ispartof><rights>2016. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2656495716?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,25753,27923,27924,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>M., Hassan</creatorcontrib><creatorcontrib>I., Ismail</creatorcontrib><creatorcontrib>N., Mohammed</creatorcontrib><creatorcontrib>Mahmoud, Maysa</creatorcontrib><title>Automatic Keyphrase Extractor from Arabic Documents</title><title>International journal of advanced computer science &amp; applications</title><description>The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.</description><subject>Algorithms</subject><subject>Electronic documents</subject><subject>Information retrieval</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotkE1LAzEQhoMoWGp_gZcFz7tOvjfHZa1aLXhQwVvIpgm2uE1NsmD_vWvXuczAvB_wIHSNocKMC3W7emra16YigEUFEggRZ2hGMBcl5xLOT3ddYpAfl2iR0g7GoYqIms4QbYYcepO3tnh2x8NnNMkVy58cjc0hFj6Gvmii6cb_XbBD7_Y5XaELb76SW_zvOXq_X761j-X65WHVNuvSYixFKZVVgilWU082nbGWABBZS-aVI4JjYljtjDOSW2KNwKTbcODeg-qcsgzoHN1MuYcYvgeXst6FIe7HSj36x2gusRhVdFLZGFKKzutD3PYmHjUGfQKkJ0D6D5CeANFfCnBXlA</recordid><startdate>2016</startdate><enddate>2016</enddate><creator>M., Hassan</creator><creator>I., Ismail</creator><creator>N., Mohammed</creator><creator>Mahmoud, Maysa</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2016</creationdate><title>Automatic Keyphrase Extractor from Arabic Documents</title><author>M., Hassan ; I., Ismail ; N., Mohammed ; Mahmoud, Maysa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1176-79c9649483f2dbacc20027874f9e26512a48eaea75c2ca612bd505ff09be9c403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Electronic documents</topic><topic>Information retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>M., Hassan</creatorcontrib><creatorcontrib>I., Ismail</creatorcontrib><creatorcontrib>N., Mohammed</creatorcontrib><creatorcontrib>Mahmoud, Maysa</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer science database</collection><collection>ProQuest research library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science &amp; applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>M., Hassan</au><au>I., Ismail</au><au>N., Mohammed</au><au>Mahmoud, Maysa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic Keyphrase Extractor from Arabic Documents</atitle><jtitle>International journal of advanced computer science &amp; applications</jtitle><date>2016</date><risdate>2016</risdate><volume>7</volume><issue>2</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2016.070226</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2158-107X
ispartof International journal of advanced computer science & applications, 2016, Vol.7 (2)
issn 2158-107X
2156-5570
language eng
recordid cdi_proquest_journals_2656495716
source Publicly Available Content Database; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Electronic documents
Information retrieval
title Automatic Keyphrase Extractor from Arabic Documents
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T17%3A25%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20Keyphrase%20Extractor%20from%20Arabic%20Documents&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=M.,%20Hassan&rft.date=2016&rft.volume=7&rft.issue=2&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2016.070226&rft_dat=%3Cproquest_cross%3E2656495716%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1176-79c9649483f2dbacc20027874f9e26512a48eaea75c2ca612bd505ff09be9c403%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2656495716&rft_id=info:pmid/&rfr_iscdi=true