Loading…
How Wikipedia disease information evolve over time? An analysis of disease-based articles changes
•Development of an analysis of how Wikipedia disease-related articles in English evolve over time.•Historical information is extracted from the articles and how their content (references, characters, diagnostic-related terms) change over time is analysed.•Most of the articles increase their content...
Saved in:
Published in: | Information processing & management 2020-05, Vol.57 (3), p.102225, Article 102225 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Development of an analysis of how Wikipedia disease-related articles in English evolve over time.•Historical information is extracted from the articles and how their content (references, characters, diagnostic-related terms) change over time is analysed.•Most of the articles increase their content through time and this is not influenced by the number of references.•Hot topics/diseases attract highest number of editions and views.•Less well-known diseases have abrupt changes, which might be consequence of having new knowledge about them.
Wikipedia, also known as "The Free Encyclopaedia”, is one of the largest online repositories of biomedical information in the world, and is nowadays increasingly been used by medical researchers and health professionals alike. In spite of its rising popularity, little attention has been devoted to the understanding of how such medical information is organised, and especially how it evolves through time. We here present an analysis aimed at characterising such evolution, with a focus on the effects that such dynamic may have on an automated knowledge extraction process. For that, we start from a data set comprising a large number of snapshots of Wikipedia's disease articles, and the corresponding diagnostic elements as provided by the DISNET project (disnet.ctb.upm.es). We then track and analyse how different metrics evolve through time, such as the total article length or the number of medical terms and references. Results highlight some expected facts, as for instance that most articles increase their content through time; and that hot topics, as Alzheimer's disease, attract the highest number of editions and views. On the other hand, relevant behaviours are observed for less well-known diseases, including abrupt changes in the text and the concentration of contributions in a handful of editors. These results stress the importance of using correctly filtered and up-to-date datasets, and more general of considering the temporal evolution of the information in Wikipedia. |
---|---|
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/j.ipm.2020.102225 |