Loading…

Semantic text mining in early drug discovery for type 2 diabetes

We extracted over 7 million n-grams from PubMed abstracts and then clustered around 240,000 linked to T2D into almost 50,000 T2D relevant 'semantic concepts'. To score papers, we weighted the concepts based on co-mentioning with core T2D proteins. A protein's T2D relevance was determi...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2020-06, Vol.15 (6), p.e0233956-e0233956
Main Authors: Hansson, Lena K, Hansen, Rasmus Borup, Pletscher-Frankild, Sune, Berzins, Rudolfs, Hansen, Daniel Hvidberg, Madsen, Dennis, Christensen, Sten B, Christiansen, Malene Revsbech, Boulund, Ulrika, Wolf, Xenia Asbaek, Kjaerulff, Sonny Kim, van de Bunt, Martijn, Tulin, Soren, Jensen, Thomas Skot, Wernersson, Rasmus, Jensen, Jan Nygaard
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We extracted over 7 million n-grams from PubMed abstracts and then clustered around 240,000 linked to T2D into almost 50,000 T2D relevant 'semantic concepts'. To score papers, we weighted the concepts based on co-mentioning with core T2D proteins. A protein's T2D relevance was determined by combining the scores of the papers mentioning it in the five preceding years. Each week all proteins were ranked according to their T2D relevance. Furthermore, the historical distribution of changes in rank from one week to the next was used to calculate the significance of a change in rank by T2D relevance for each protein. We show that T2D relevant papers, even those not mentioning T2D explicitly, were prioritised by relevant semantic concepts. Well known T2D proteins were therefore enriched among the top scoring proteins. Our 'high jumpers' identified important past developments in the apprehension of how certain key proteins relate to T2D, indicating that our method will make us aware of future breakthroughs. In summary, this project facilitated keeping up with current T2D research by repeatedly providing short lists of potential novel targets into our early drug discovery pipeline.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0233956