Loading…

Zachycení výstavby textu v Pražském závislostním korpusu

Language corpora annotation schemes cover various layers of sentence description nowadays – from morphology to semantics. Annotation projects concerning phenomena beyond the sentence boundaries, however, started to attract the attention of corpus linguists only recently. In the present contribution,...

Full description

Saved in:
Bibliographic Details
Published in:Slovo a slovesnost 2015, Vol.76 (3), p.163-197
Main Authors: Poláková, Lucie, Mírovský, Jiří, Jínová, Pavlína, Zikánová, Šárka, Hajičová, Eva, Rysová, Magdaléna, Nedoluzhko, Anna
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Language corpora annotation schemes cover various layers of sentence description nowadays – from morphology to semantics. Annotation projects concerning phenomena beyond the sentence boundaries, however, started to attract the attention of corpus linguists only recently. In the present contribution, we describe a unified approach to analysis of discourse phenomena, aimed and developed for a large-scale annotation of Czech empirical data of the Prague Dependency Treebank. This approach is based on two fundamental pillars: (i) it exploits the results of one of the first complex schemes for discourse annotation proposed and realized in the Penn Discourse Treebank for English; (ii) it follows the Praguian Functional Generative Description and treebanking tradition, taking advantage of the tectogrammatical (underlying) layer of sentence analysis and extending it to a full discourse-level description. Our analysis concentrates on two major aspects of discourse coherence: (i) on discourse relations (semantic relations between discourse segments) and discourse connectives as their lexical anchors; and (ii) on coreference and the so-called bridging anaphora. We present a detailed description of the annotation scheme and procedure, address individual problematic issues and offer basic corpus statistics and annotation evaluation.
ISSN:0037-7031
2571-0885