Loading…
CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)
A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announc...
Saved in:
Main Authors: | , , |
---|---|
Format: | Dataset |
Language: | Spanish |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Campillos-Llanos, Leonardo Valverde-Mateos, Ana Capllónch-Carrión, Adrián |
description | A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: • ANAT: anatomy • CHEM: pharmacological and chemical substances • DEVI: medical devices • DISO: pathologic conditions • LIVB: living beings, included the human being • PHYS: physiological processes • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: • Contraindicated: a contraindicated drug or treatment • Dose: dose or strength • Form: dosage form • Route: administration route or mode- Temporal expressions • Age • Date • Duration • Frequency • Time- Miscellaneous medical entities: • Concept: abstract concepts, statistical tests or measurement scales • Food: foods or drinks • Observation: medical observations or clinical findings • Quantifier_or_Qualifier: quantifier or qualifier adjective • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: • Neg_cue: negation cue • Negated: negated event • Spec_cue: speculation cue • Speculated: speculated or uncertain event- Attributes: • Temporality: ◦ History_of: past event ◦ Future: future event • Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member
86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).
The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:
- 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:
• Urgencias Bidasoa (https://urgenciasbidasoa.wordpres |
doi_str_mv | 10.5281/zenodo.13880598 |
format | dataset |
fullrecord | <record><control><sourceid>datacite</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_13880598</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_13880598</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_138805983</originalsourceid><addsrcrecordid>eNqVj7-rwjAURoMo-HN2vaNviCbVYl0tfbgIgl0lhOYWr9SkJFrQv14fT8HV6TvDd4bD2FiKaRwlcnZH64ybynmSiHiVtFhPLpaSR1LE7Q_usn4IJyHk8nnrsUOa82y95fsdNAEi4JA6X18DuBLSiiwVuoLck64ClM5D1pBBWyBf64CGb9FQQRaBLOxrbSkcYdKgD-QsRD9D1imfJo5eO2Cz3yxPN9zoiy7ogqr2dNb-pqRQfxXqv0K9K-bfGw_EhU9N</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><source>Publicly Available Content Database</source><source>PubMed Central(OpenAccess)</source><creator>Campillos-Llanos, Leonardo ; Valverde-Mateos, Ana ; Capllónch-Carrión, Adrián</creator><creatorcontrib>Campillos-Llanos, Leonardo ; Valverde-Mateos, Ana ; Capllónch-Carrión, Adrián</creatorcontrib><description>A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: • ANAT: anatomy • CHEM: pharmacological and chemical substances • DEVI: medical devices • DISO: pathologic conditions • LIVB: living beings, included the human being • PHYS: physiological processes • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: • Contraindicated: a contraindicated drug or treatment • Dose: dose or strength • Form: dosage form • Route: administration route or mode- Temporal expressions • Age • Date • Duration • Frequency • Time- Miscellaneous medical entities: • Concept: abstract concepts, statistical tests or measurement scales • Food: foods or drinks • Observation: medical observations or clinical findings • Quantifier_or_Qualifier: quantifier or qualifier adjective • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: • Neg_cue: negation cue • Negated: negated event • Spec_cue: speculation cue • Speculated: speculated or uncertain event- Attributes: • Temporality: ◦ History_of: past event ◦ Future: future event • Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member
86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).
The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:
- 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:
• Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/) • Hipocampo.org (https://www.hipocampo.org/) • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/). If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents.
The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License.
If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows:
Campillos-Llanos, L., A. Valverde-Mateos & A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.5281/zenodo.13880598</identifier><language>spa</language><publisher>Zenodo</publisher><subject>Clinical Trials ; Evidence-based Medicine ; Natural Language Processing ; Semantic Annotation</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-9593-8621 ; 0000-0003-1610-0770 ; 0000-0003-3040-1756</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784,27925</link.rule.ids></links><search><creatorcontrib>Campillos-Llanos, Leonardo</creatorcontrib><creatorcontrib>Valverde-Mateos, Ana</creatorcontrib><creatorcontrib>Capllónch-Carrión, Adrián</creatorcontrib><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><description>A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: • ANAT: anatomy • CHEM: pharmacological and chemical substances • DEVI: medical devices • DISO: pathologic conditions • LIVB: living beings, included the human being • PHYS: physiological processes • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: • Contraindicated: a contraindicated drug or treatment • Dose: dose or strength • Form: dosage form • Route: administration route or mode- Temporal expressions • Age • Date • Duration • Frequency • Time- Miscellaneous medical entities: • Concept: abstract concepts, statistical tests or measurement scales • Food: foods or drinks • Observation: medical observations or clinical findings • Quantifier_or_Qualifier: quantifier or qualifier adjective • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: • Neg_cue: negation cue • Negated: negated event • Spec_cue: speculation cue • Speculated: speculated or uncertain event- Attributes: • Temporality: ◦ History_of: past event ◦ Future: future event • Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member
86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).
The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:
- 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:
• Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/) • Hipocampo.org (https://www.hipocampo.org/) • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/). If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents.
The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License.
If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows:
Campillos-Llanos, L., A. Valverde-Mateos & A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.</description><subject>Clinical Trials</subject><subject>Evidence-based Medicine</subject><subject>Natural Language Processing</subject><subject>Semantic Annotation</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><recordid>eNqVj7-rwjAURoMo-HN2vaNviCbVYl0tfbgIgl0lhOYWr9SkJFrQv14fT8HV6TvDd4bD2FiKaRwlcnZH64ybynmSiHiVtFhPLpaSR1LE7Q_usn4IJyHk8nnrsUOa82y95fsdNAEi4JA6X18DuBLSiiwVuoLck64ClM5D1pBBWyBf64CGb9FQQRaBLOxrbSkcYdKgD-QsRD9D1imfJo5eO2Cz3yxPN9zoiy7ogqr2dNb-pqRQfxXqv0K9K-bfGw_EhU9N</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Campillos-Llanos, Leonardo</creator><creator>Valverde-Mateos, Ana</creator><creator>Capllónch-Carrión, Adrián</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0001-9593-8621</orcidid><orcidid>https://orcid.org/0000-0003-1610-0770</orcidid><orcidid>https://orcid.org/0000-0003-3040-1756</orcidid></search><sort><creationdate>20241001</creationdate><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><author>Campillos-Llanos, Leonardo ; Valverde-Mateos, Ana ; Capllónch-Carrión, Adrián</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_138805983</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>spa</language><creationdate>2024</creationdate><topic>Clinical Trials</topic><topic>Evidence-based Medicine</topic><topic>Natural Language Processing</topic><topic>Semantic Annotation</topic><toplevel>online_resources</toplevel><creatorcontrib>Campillos-Llanos, Leonardo</creatorcontrib><creatorcontrib>Valverde-Mateos, Ana</creatorcontrib><creatorcontrib>Capllónch-Carrión, Adrián</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Campillos-Llanos, Leonardo</au><au>Valverde-Mateos, Ana</au><au>Capllónch-Carrión, Adrián</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2)</title><date>2024-10-01</date><risdate>2024</risdate><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: • ANAT: anatomy • CHEM: pharmacological and chemical substances • DEVI: medical devices • DISO: pathologic conditions • LIVB: living beings, included the human being • PHYS: physiological processes • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: • Contraindicated: a contraindicated drug or treatment • Dose: dose or strength • Form: dosage form • Route: administration route or mode- Temporal expressions • Age • Date • Duration • Frequency • Time- Miscellaneous medical entities: • Concept: abstract concepts, statistical tests or measurement scales • Food: foods or drinks • Observation: medical observations or clinical findings • Quantifier_or_Qualifier: quantifier or qualifier adjective • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: • Neg_cue: negation cue • Negated: negated event • Spec_cue: speculation cue • Speculated: speculated or uncertain event- Attributes: • Temporality: ◦ History_of: past event ◦ Future: future event • Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member
86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).
The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:
- 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:
• Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/) • Hipocampo.org (https://www.hipocampo.org/) • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/). If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents.
The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License.
If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows:
Campillos-Llanos, L., A. Valverde-Mateos & A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.13880598</doi><orcidid>https://orcid.org/0000-0001-9593-8621</orcidid><orcidid>https://orcid.org/0000-0003-1610-0770</orcidid><orcidid>https://orcid.org/0000-0003-3040-1756</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2105 |
ispartof | |
issn | 1471-2105 1471-2105 |
language | spa |
recordid | cdi_datacite_primary_10_5281_zenodo_13880598 |
source | Publicly Available Content Database; PubMed Central(OpenAccess) |
subjects | Clinical Trials Evidence-based Medicine Natural Language Processing Semantic Annotation |
title | CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2) |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T00%3A29%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Campillos-Llanos,%20Leonardo&rft.date=2024-10-01&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.5281/zenodo.13880598&rft_dat=%3Cdatacite%3E10_5281_zenodo_13880598%3C/datacite%3E%3Cgrp_id%3Ecdi_FETCH-datacite_primary_10_5281_zenodo_138805983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |