Loading…
Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting
•Electronic radiation oncology records in an Australian setting were anonymised.•13 personally identifiable entities were anonymised using Microsoft Presidio.•Presidio scored a strict and relaxed F1-score of 0.8471 and 0.8980, respectively.•Presidio can be utilised for safe use and sharing of certai...
Saved in:
Published in: | International journal of medical informatics (Shannon, Ireland) Ireland), 2022-12, Vol.168, p.104880-104880, Article 104880 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Electronic radiation oncology records in an Australian setting were anonymised.•13 personally identifiable entities were anonymised using Microsoft Presidio.•Presidio scored a strict and relaxed F1-score of 0.8471 and 0.8980, respectively.•Presidio can be utilised for safe use and sharing of certain cancer data within Australia.
Electronic medical records (EMRs) contain valuable information for clinical research, however, the presence of personally identifying information (PII) restricts their use. Anonymisation of PII from EMRs enables clinical information to be shared for research purposes. Since there is limited research relating to the anonymisation of Australian EMRs, the performance of Microsoft Presidio with customisation on clinical documents from an Australian radiation oncology information system (OIS) was evaluated.
A random sample of 300 unstructured free-text clinical documents were extracted from the Prince of Wales Cancer Centre OIS on patients diagnosed with cancer of the head and neck between 2000 and 2017. Anonymisation of clinical text was performed using Microsoft Presidio, implemented in Python programming language. Each clinical document was manually compared pre- and post-anonymisation for the identification and redaction of 13 PII. Model performance was evaluated using three classification criteria; correct, partial, and missed classification, to determine recall, precision, and F1-score. These three metrics were performed under relaxed conditions, where partial classifications were considered correct, and under strict conditions, where only correct classifications were considered correct.
A total of 8,713 PII were identified, of which 7,026 (81%) were classified as correct, 850 (10%) as partial, and 837 (9%) as missed. There were 245 instances of incorrect classifications. Evaluation of the model demonstrated an average precision of 0.8921, recall (strict) of 0.8064, F1-score (strict) of 0.8471, recall (relaxed) of 0.9039, and F1-score (relaxed) of 0.8980.
This is the first example of an open-source anonymisation model to be customised and tested on clinical documents from an Australian radiation oncology EMR. These findings support the use of Presidio for the safe use and sharing of cancer data within Australia for certain PII, however, additional checks are required to ensure person names are successfully anonymised. |
---|---|
ISSN: | 1386-5056 1872-8243 |
DOI: | 10.1016/j.ijmedinf.2022.104880 |