Loading…
CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)
A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announc...
Saved in:
Main Authors: | , , |
---|---|
Format: | Dataset |
Language: | Spanish |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: • ANAT: anatomy • CHEM: pharmacological and chemical substances • DEVI: medical devices • DISO: pathologic conditions • LIVB: living beings, included the human being • PHYS: physiological processes • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: • Contraindicated: a contraindicated drug or treatment • Dose: dose or strength • Form: dosage form • Route: administration route or mode- Temporal expressions • Age • Date • Duration • Frequency • Time- Miscellaneous medical entities: • Concept: abstract concepts, statistical tests or measurement scales • Food: foods or drinks • Observation: medical observations or clinical findings • Quantifier_or_Qualifier: quantifier or qualifier adjective • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: • Neg_cue: negation cue • Negated: negated event • Spec_cue: speculation cue • Speculated: speculated or uncertain event- Attributes: • Temporality: ◦ History_of: past event ◦ Future: future event • Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member
86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).
The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:
- 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:
• Urgencias Bidasoa (https://urgenciasbidasoa.wordpres |
---|---|
ISSN: | 1471-2105 1471-2105 |
DOI: | 10.5281/zenodo.13880598 |