Loading…

CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)

A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announc...

Full description

Saved in:
Bibliographic Details
Main Authors: Campillos-Llanos, Leonardo, Valverde-Mateos, Ana, Capllónch-Carrión, Adrián
Format: Dataset
Language:Spanish
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos. Texts were annotated with the following entities types: - Semantic groups from the Unified Medical Language System:   • ANAT: anatomy  • CHEM: pharmacological and chemical substances  • DEVI: medical devices  • DISO: pathologic conditions   • LIVB: living beings, included the human being  • PHYS: physiological processes  • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information:  • Contraindicated: a contraindicated drug or treatment  • Dose: dose or strength  • Form: dosage form  • Route: administration route or mode- Temporal expressions    • Age  • Date  • Duration  • Frequency  • Time- Miscellaneous medical entities:   • Concept: abstract concepts, statistical tests or measurement scales  • Food: foods or drinks  • Observation: medical observations or clinical findings  • Quantifier_or_Qualifier: quantifier or qualifier adjective  • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation:    • Neg_cue: negation cue  • Negated: negated event  • Spec_cue: speculation cue  • Speculated: speculated or uncertain event- Attributes:   • Temporality:    ◦ History_of: past event    ◦ Future: future event  • Experiencer:    ◦ Patient: patient or participant on a clinical trial    ◦ Family_member    ◦ Other: other person different from the patient or the family member 86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).  The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool: - 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:    • Urgencias Bidasoa (https://urgenciasbidasoa.wordpres
ISSN:1471-2105
1471-2105
DOI:10.5281/zenodo.13880598