Loading…

CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)

A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announc...

Full description

Saved in:
Bibliographic Details
Main Authors: Campillos-Llanos, Leonardo, Valverde-Mateos, Ana, Capllónch-Carrión, Adrián
Format: Dataset
Language:Spanish
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Campillos-Llanos, Leonardo
Valverde-Mateos, Ana
Capllónch-Carrión, Adrián
description A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos. Texts were annotated with the following entities types: - Semantic groups from the Unified Medical Language System:   • ANAT: anatomy  • CHEM: pharmacological and chemical substances  • DEVI: medical devices  • DISO: pathologic conditions   • LIVB: living beings, included the human being  • PHYS: physiological processes  • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information:  • Contraindicated: a contraindicated drug or treatment  • Dose: dose or strength  • Form: dosage form  • Route: administration route or mode- Temporal expressions    • Age  • Date  • Duration  • Frequency  • Time- Miscellaneous medical entities:   • Concept: abstract concepts, statistical tests or measurement scales  • Food: foods or drinks  • Observation: medical observations or clinical findings  • Quantifier_or_Qualifier: quantifier or qualifier adjective  • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation:    • Neg_cue: negation cue  • Negated: negated event  • Spec_cue: speculation cue  • Speculated: speculated or uncertain event- Attributes:   • Temporality:    ◦ History_of: past event    ◦ Future: future event  • Experiencer:    ◦ Patient: patient or participant on a clinical trial    ◦ Family_member    ◦ Other: other person different from the patient or the family member 86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).  The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool: - 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:    • Urgencias Bidasoa (https://urgenciasbidasoa.wordpres
doi_str_mv 10.5281/zenodo.13880598
format dataset
fullrecord <record><control><sourceid>datacite</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_13880598</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_13880598</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_138805983</originalsourceid><addsrcrecordid>eNqVj7-rwjAURoMo-HN2vaNviCbVYl0tfbgIgl0lhOYWr9SkJFrQv14fT8HV6TvDd4bD2FiKaRwlcnZH64ybynmSiHiVtFhPLpaSR1LE7Q_usn4IJyHk8nnrsUOa82y95fsdNAEi4JA6X18DuBLSiiwVuoLck64ClM5D1pBBWyBf64CGb9FQQRaBLOxrbSkcYdKgD-QsRD9D1imfJo5eO2Cz3yxPN9zoiy7ogqr2dNb-pqRQfxXqv0K9K-bfGw_EhU9N</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><source>Publicly Available Content Database</source><source>PubMed Central(OpenAccess)</source><creator>Campillos-Llanos, Leonardo ; Valverde-Mateos, Ana ; Capllónch-Carrión, Adrián</creator><creatorcontrib>Campillos-Llanos, Leonardo ; Valverde-Mateos, Ana ; Capllónch-Carrión, Adrián</creatorcontrib><description>A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos. Texts were annotated with the following entities types: - Semantic groups from the Unified Medical Language System:   • ANAT: anatomy  • CHEM: pharmacological and chemical substances  • DEVI: medical devices  • DISO: pathologic conditions   • LIVB: living beings, included the human being  • PHYS: physiological processes  • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information:  • Contraindicated: a contraindicated drug or treatment  • Dose: dose or strength  • Form: dosage form  • Route: administration route or mode- Temporal expressions    • Age  • Date  • Duration  • Frequency  • Time- Miscellaneous medical entities:   • Concept: abstract concepts, statistical tests or measurement scales  • Food: foods or drinks  • Observation: medical observations or clinical findings  • Quantifier_or_Qualifier: quantifier or qualifier adjective  • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation:    • Neg_cue: negation cue  • Negated: negated event  • Spec_cue: speculation cue  • Speculated: speculated or uncertain event- Attributes:   • Temporality:    ◦ History_of: past event    ◦ Future: future event  • Experiencer:    ◦ Patient: patient or participant on a clinical trial    ◦ Family_member    ◦ Other: other person different from the patient or the family member 86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).  The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool: - 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:    • Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/)   • Hipocampo.org (https://www.hipocampo.org/)   • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/).   If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents. The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License. If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows: Campillos-Llanos, L., A. Valverde-Mateos &amp; A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.5281/zenodo.13880598</identifier><language>spa</language><publisher>Zenodo</publisher><subject>Clinical Trials ; Evidence-based Medicine ; Natural Language Processing ; Semantic Annotation</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-9593-8621 ; 0000-0003-1610-0770 ; 0000-0003-3040-1756</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784,27925</link.rule.ids></links><search><creatorcontrib>Campillos-Llanos, Leonardo</creatorcontrib><creatorcontrib>Valverde-Mateos, Ana</creatorcontrib><creatorcontrib>Capllónch-Carrión, Adrián</creatorcontrib><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><description>A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos. Texts were annotated with the following entities types: - Semantic groups from the Unified Medical Language System:   • ANAT: anatomy  • CHEM: pharmacological and chemical substances  • DEVI: medical devices  • DISO: pathologic conditions   • LIVB: living beings, included the human being  • PHYS: physiological processes  • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information:  • Contraindicated: a contraindicated drug or treatment  • Dose: dose or strength  • Form: dosage form  • Route: administration route or mode- Temporal expressions    • Age  • Date  • Duration  • Frequency  • Time- Miscellaneous medical entities:   • Concept: abstract concepts, statistical tests or measurement scales  • Food: foods or drinks  • Observation: medical observations or clinical findings  • Quantifier_or_Qualifier: quantifier or qualifier adjective  • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation:    • Neg_cue: negation cue  • Negated: negated event  • Spec_cue: speculation cue  • Speculated: speculated or uncertain event- Attributes:   • Temporality:    ◦ History_of: past event    ◦ Future: future event  • Experiencer:    ◦ Patient: patient or participant on a clinical trial    ◦ Family_member    ◦ Other: other person different from the patient or the family member 86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).  The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool: - 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:    • Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/)   • Hipocampo.org (https://www.hipocampo.org/)   • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/).   If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents. The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License. If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows: Campillos-Llanos, L., A. Valverde-Mateos &amp; A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.</description><subject>Clinical Trials</subject><subject>Evidence-based Medicine</subject><subject>Natural Language Processing</subject><subject>Semantic Annotation</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><recordid>eNqVj7-rwjAURoMo-HN2vaNviCbVYl0tfbgIgl0lhOYWr9SkJFrQv14fT8HV6TvDd4bD2FiKaRwlcnZH64ybynmSiHiVtFhPLpaSR1LE7Q_usn4IJyHk8nnrsUOa82y95fsdNAEi4JA6X18DuBLSiiwVuoLck64ClM5D1pBBWyBf64CGb9FQQRaBLOxrbSkcYdKgD-QsRD9D1imfJo5eO2Cz3yxPN9zoiy7ogqr2dNb-pqRQfxXqv0K9K-bfGw_EhU9N</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Campillos-Llanos, Leonardo</creator><creator>Valverde-Mateos, Ana</creator><creator>Capllónch-Carrión, Adrián</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0001-9593-8621</orcidid><orcidid>https://orcid.org/0000-0003-1610-0770</orcidid><orcidid>https://orcid.org/0000-0003-3040-1756</orcidid></search><sort><creationdate>20241001</creationdate><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><author>Campillos-Llanos, Leonardo ; Valverde-Mateos, Ana ; Capllónch-Carrión, Adrián</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_138805983</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>spa</language><creationdate>2024</creationdate><topic>Clinical Trials</topic><topic>Evidence-based Medicine</topic><topic>Natural Language Processing</topic><topic>Semantic Annotation</topic><toplevel>online_resources</toplevel><creatorcontrib>Campillos-Llanos, Leonardo</creatorcontrib><creatorcontrib>Valverde-Mateos, Ana</creatorcontrib><creatorcontrib>Capllónch-Carrión, Adrián</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Campillos-Llanos, Leonardo</au><au>Valverde-Mateos, Ana</au><au>Capllónch-Carrión, Adrián</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><date>2024-10-01</date><risdate>2024</risdate><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos. Texts were annotated with the following entities types: - Semantic groups from the Unified Medical Language System:   • ANAT: anatomy  • CHEM: pharmacological and chemical substances  • DEVI: medical devices  • DISO: pathologic conditions   • LIVB: living beings, included the human being  • PHYS: physiological processes  • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information:  • Contraindicated: a contraindicated drug or treatment  • Dose: dose or strength  • Form: dosage form  • Route: administration route or mode- Temporal expressions    • Age  • Date  • Duration  • Frequency  • Time- Miscellaneous medical entities:   • Concept: abstract concepts, statistical tests or measurement scales  • Food: foods or drinks  • Observation: medical observations or clinical findings  • Quantifier_or_Qualifier: quantifier or qualifier adjective  • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation:    • Neg_cue: negation cue  • Negated: negated event  • Spec_cue: speculation cue  • Speculated: speculated or uncertain event- Attributes:   • Temporality:    ◦ History_of: past event    ◦ Future: future event  • Experiencer:    ◦ Patient: patient or participant on a clinical trial    ◦ Family_member    ◦ Other: other person different from the patient or the family member 86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).  The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool: - 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:    • Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/)   • Hipocampo.org (https://www.hipocampo.org/)   • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/).   If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents. The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License. If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows: Campillos-Llanos, L., A. Valverde-Mateos &amp; A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.13880598</doi><orcidid>https://orcid.org/0000-0001-9593-8621</orcidid><orcidid>https://orcid.org/0000-0003-1610-0770</orcidid><orcidid>https://orcid.org/0000-0003-3040-1756</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof
issn 1471-2105
1471-2105
language spa
recordid cdi_datacite_primary_10_5281_zenodo_13880598
source Publicly Available Content Database; PubMed Central(OpenAccess)
subjects Clinical Trials
Evidence-based Medicine
Natural Language Processing
Semantic Annotation
title CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T00%3A29%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Campillos-Llanos,%20Leonardo&rft.date=2024-10-01&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.5281/zenodo.13880598&rft_dat=%3Cdatacite%3E10_5281_zenodo_13880598%3C/datacite%3E%3Cgrp_id%3Ecdi_FETCH-datacite_primary_10_5281_zenodo_138805983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true