Loading…
A stemming algorithm for Latvian
The thesis covers construction, application and evaluation of a stemming algorithm for advanced information searching and retrieval in Latvian databases. Its aim is to examine the following two questions: Is it possible to apply for Latvian a suffix removal algorithm originally designed for English?...
Saved in:
Main Author: | |
---|---|
Format: | Default Thesis |
Published: |
1996
|
Subjects: | |
Online Access: | https://hdl.handle.net/2134/7433 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1818175929184157696 |
---|---|
author | Karlis Kreslins |
author_facet | Karlis Kreslins |
author_sort | Karlis Kreslins (7173986) |
collection | Figshare |
description | The thesis covers construction, application and evaluation of a stemming algorithm for advanced information searching and retrieval in Latvian databases. Its aim is to examine the following two questions: Is it possible to apply for Latvian a suffix removal algorithm originally designed for English? Can stemming in Latvian produce the same or better information retrieval results than manual truncation? In order to achieve these aims, the role and importance of automatic word conflation both for document indexing and information retrieval are characterised. A review of literature, which analyzes and evaluates different types of stemming techniques and retrospective development of stemming algorithms, justifies the necessity to apply this advanced IR method also for Latvian. Comparative analysis of morphological structure both for English and Latvian language determined the selection of Porter's suffix removal algorithm as a basis for the Latvian sternmer. An extensive list of Latvian stopwords including conjunctions, particles and adverbs, was designed and added to the initial sternmer in order to eliminate insignificant words from further processing. A number of specific modifications and changes related to the Latvian language were carried out to the structure and rules of the original stemming algorithm. Analysis of word stemming based on Latvian electronic dictionary and Latvian text fragments confirmed that the suffix removal technique can be successfully applied also to Latvian language. An evaluation study of user search statements revealed that the stemming algorithm to a certain extent can improve effectiveness of information retrieval. |
format | Default Thesis |
id | rr-article-9414536 |
institution | Loughborough University |
publishDate | 1996 |
record_format | Figshare |
spelling | rr-article-94145361996-01-01T00:00:00Z A stemming algorithm for Latvian Karlis Kreslins (7173986) Other information and computing sciences not elsewhere classified Information science & librarianship Stemming algorithm Latvian Information and Computing Sciences not elsewhere classified The thesis covers construction, application and evaluation of a stemming algorithm for advanced information searching and retrieval in Latvian databases. Its aim is to examine the following two questions: Is it possible to apply for Latvian a suffix removal algorithm originally designed for English? Can stemming in Latvian produce the same or better information retrieval results than manual truncation? In order to achieve these aims, the role and importance of automatic word conflation both for document indexing and information retrieval are characterised. A review of literature, which analyzes and evaluates different types of stemming techniques and retrospective development of stemming algorithms, justifies the necessity to apply this advanced IR method also for Latvian. Comparative analysis of morphological structure both for English and Latvian language determined the selection of Porter's suffix removal algorithm as a basis for the Latvian sternmer. An extensive list of Latvian stopwords including conjunctions, particles and adverbs, was designed and added to the initial sternmer in order to eliminate insignificant words from further processing. A number of specific modifications and changes related to the Latvian language were carried out to the structure and rules of the original stemming algorithm. Analysis of word stemming based on Latvian electronic dictionary and Latvian text fragments confirmed that the suffix removal technique can be successfully applied also to Latvian language. An evaluation study of user search statements revealed that the stemming algorithm to a certain extent can improve effectiveness of information retrieval. 1996-01-01T00:00:00Z Text Thesis 2134/7433 https://figshare.com/articles/thesis/A_stemming_algorithm_for_Latvian/9414536 CC BY-NC-ND 4.0 |
spellingShingle | Other information and computing sciences not elsewhere classified Information science & librarianship Stemming algorithm Latvian Information and Computing Sciences not elsewhere classified Karlis Kreslins A stemming algorithm for Latvian |
title | A stemming algorithm for Latvian |
title_full | A stemming algorithm for Latvian |
title_fullStr | A stemming algorithm for Latvian |
title_full_unstemmed | A stemming algorithm for Latvian |
title_short | A stemming algorithm for Latvian |
title_sort | stemming algorithm for latvian |
topic | Other information and computing sciences not elsewhere classified Information science & librarianship Stemming algorithm Latvian Information and Computing Sciences not elsewhere classified |
url | https://hdl.handle.net/2134/7433 |