Loading…

How Wikipedia disease information evolve over time? An analysis of disease-based articles changes

•Development of an analysis of how Wikipedia disease-related articles in English evolve over time.•Historical information is extracted from the articles and how their content (references, characters, diagnostic-related terms) change over time is analysed.•Most of the articles increase their content...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management 2020-05, Vol.57 (3), p.102225, Article 102225
Main Authors: Lagunes-García, Gerardo, Rodríguez-González, Alejandro, Prieto-Santamaría, Lucía, García del Valle, Eduardo P., Zanin, Massimiliano, Menasalvas-Ruiz, Ernestina
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Development of an analysis of how Wikipedia disease-related articles in English evolve over time.•Historical information is extracted from the articles and how their content (references, characters, diagnostic-related terms) change over time is analysed.•Most of the articles increase their content through time and this is not influenced by the number of references.•Hot topics/diseases attract highest number of editions and views.•Less well-known diseases have abrupt changes, which might be consequence of having new knowledge about them. Wikipedia, also known as "The Free Encyclopaedia”, is one of the largest online repositories of biomedical information in the world, and is nowadays increasingly been used by medical researchers and health professionals alike. In spite of its rising popularity, little attention has been devoted to the understanding of how such medical information is organised, and especially how it evolves through time. We here present an analysis aimed at characterising such evolution, with a focus on the effects that such dynamic may have on an automated knowledge extraction process. For that, we start from a data set comprising a large number of snapshots of Wikipedia's disease articles, and the corresponding diagnostic elements as provided by the DISNET project (disnet.ctb.upm.es). We then track and analyse how different metrics evolve through time, such as the total article length or the number of medical terms and references. Results highlight some expected facts, as for instance that most articles increase their content through time; and that hot topics, as Alzheimer's disease, attract the highest number of editions and views. On the other hand, relevant behaviours are observed for less well-known diseases, including abrupt changes in the text and the concentration of contributions in a handful of editors. These results stress the importance of using correctly filtered and up-to-date datasets, and more general of considering the temporal evolution of the information in Wikipedia.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2020.102225